filesystemsnfshpcsupercomputerslustre

What is scratch space /filesystem in HPC


I am studying about HPC applications and Parallel Filesystems. I came across the term scratch space AND scratch filesystem.

I cannot visualize where this scratch space exists. Is it on the compute node as a mounted filesystem /scratch or on the main storage space.

What are it's contents.

Is scratch space independent on each compute node or, two or more nodes can share a single scratch space.

So lets say I have a file 123.txt which I want to process parallelly. Will the scratch space contain the parts of this file or the whole file will be copied.

I am confused and nowhere on google is there a clear description. Please point out to some.

Thanks a Lot.


Solution

  • It all depends on how the cluster was setup and what the users need. When you are given access to a cluster you should also be given some information about how it is meant to be used which should answer most of your questions.

    On one of the clusters I work with NFS is used for long term storage and some Lustre space is available for job scratch space. Both the NFS and Lustre are seen by all of the nodes. Each of the nodes also has some scratch space on the node that only that node can see.

    If you want your job to work on 123.txt in parallel you can copy 123.txt to a shared scratch space(Lustre) or you can copy it to each of your node scratch spaces in your job file.

    for i in `cat $PBS_NODEFILE | sort -u ` ; do scp 123.txt $i:/scratch ; done
    

    Once each node has a copy you can run your job. Once the job is done you need to copy your results to persistent storage since clusters will often run scripts to cleanup scratch space.