I have problems understanding the time usage report below:
1) why the times for job step 1 & 2 do not add up to the batch line?
2) what is the relationship between each column, especially for TotalCPU
and CPUTime
?
3) for time usage of the job, which one is best to report?
$ sacct -o JOBID,AllocCPUs,AveCPU,reqcpus,systemcpu,usercpu,tot
alcpu,cputime,cputimeraw -j 649176
JobID AllocCPUS AveCPU ReqCPUS SystemCPU UserCPU TotalCPU CPUTime CPUTimeRAW
------------ ---------- ---------- -------- ---------- ---------- ---------- ---------- ----------
649176 24 24 00:02.047 01:06.896 01:08.943 00:23:36 1416
649176.batch 24 00:00:00 24 00:00.027 00:00.014 00:00.041 00:23:36 1416
649176.0 24 00:00:00 24 00:00.813 00:24.886 00:25.699 00:08:48 528
649176.1 24 00:00:18 24 00:01.207 00:41.996 00:43.203 00:14:24 864
1) why the times for job step 1 & 2 do not add up to the batch line?
The time reported for .batch
for SystemCPU, UserCPU and TotalCPU is the time spend running the commands in the batch file, not counting the spawned processes[1]. CPUTime and CPUTimeRAW do count the spawned processes and thus they add up to the lines corresponding to the job steps.
2) what is the relationship between each column, especially for TotalCPU and CPUTime?
TotalCPU is the sum of UserCPU and SystemCPU of each CPU, while CPUTime is the elapsed time multiplied by the number requested CPU. The difference between both is the time spent with the CPUs doing nothing (neither in user mode nor in kernel mode), most of the time waiting for I/O [2]
3) for time usage of the job, which one is best to report?
It depends on what you want to show. Elapsed (which you did not show here) gives the "time to solution". CPUTimeRAW is what is often accounted and paid for. Difference between CPUTime and TotalCPU gives information about the I/O overhead.
[1] From the man page
SystemCPU The amount of system CPU time used by the job or job step. The format of the output is identical to that of the Elapsed field.
NOTE: SystemCPU provides a measure of the task’s parent process and does not include CPU time of child processes.