[SOLVED] How can I get detailed job run info from SLURM (e.g. like that produced for "standard output" by LSF)?

How can I get detailed job run info from SLURM (e.g. like that produced for "standard output" by LSF)?

When using bsub with LSF, the -o option gave a lot of details such as when the job started and ended and how much memory and CPU time the job took. With SLURM, all I get is the same standard output that I'd get from running a script without LSF.

For example, given this Perl 6 script:

warn  "standard error stream";
say  "standard output stream";

Submitted thus:

sbatch -o test.o%j -e test.e%j -J test_warn --wrap 'perl6 test.p6'

Resulted in the file test.o34380:

Testing standard output

and the file test.e34380:

Testing standard Error  in block <unit> at test.p6:2

With LSF, I'd get all kinds of details in the standard output file, something like:

Sender: LSF System <lsfadmin@my_node>
Subject: Job 347511: <test> Done

Job <test> was submitted from host <my_cluster> by user <username> in cluster <my_cluster_act>.
Job was executed on host(s) <my_node>, in queue <normal>, as user <username> in cluster <my_cluster_act>.
</home/username> was used as the home directory.
</path/to/working/directory> was used as the working directory.
Started at Mon Mar 16 13:10:23 2015
Results reported at Mon Mar 16 13:10:29 2015

Your job looked like:

------------------------------------------------------------
# LSBATCH: User input
perl6 test.p6

------------------------------------------------------------

Successfully completed.

Resource usage summary:

    CPU time   :    0.19 sec.
    Max Memory :    0.10 MB
    Max Swap   :    0.10 MB

    Max Processes  :         2
    Max Threads    :         3

The output (if any) follows:

standard output stream

PS:

Read file <test.e_347511> for stderr output of this job.

Update:

One or more -v flags to sbatch gives more preliminary information, but doesn't change the standard output.

Update 2:

Use seff JOBID for the desired info (where JOBID is the actual number). Just be aware that it collects data once a minute, so it might say that your max memory usage was 2.2GB, even though your job was killed due to using more than the 4GB of memory you requested.

Solution

At the end of each job I use to insert

sstat -j $SLURM_JOB_ID.batch --format=JobID,MaxVMSize

to add RAM usage to the standard output.