slurmsacct

slurm: unable to get job's information using scontrol


When I run following command I am able to see bunch of slurm jobs. Since I can see them, I believe their log should be saved.

$ sacct --format="JobID,JobName%30"                          
       JobID                        JobName
------------ ------------------------------
3            19kuX6ge4WzE2cyRtAUozP1SSE9HR+
3.batch                               batch
4            19kuX6ge4WzE2cyRtAUozP1SSE9HR+
4.batch                               batch
5            19kuX6ge4WzE2cyRtAUozP1SSE9HR+
5.batch                               batch
9.batch                               batch
2                                    run.sh
2.batch                               batch

$ sacct --jobs=4                                             
       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
4            19kuX6ge4+      debug      alper          1  COMPLETED      0:0
4.batch           batch                 alper          1  COMPLETED      0:0

Afterwards, when I do: scontrol show job <job_id>, I won't able to return the complete job's information.

$ scontrol show job 4                                       
slurm_load_jobs error: Invalid job id specified

What may be the reason for this? Is there any alternative way to fetch the job's information such as its RunTime.


Solution

  • scontrol only shows information about currently running, or recently finished, jobs. The "recently finished" time depends on the installation but is 5 minutes by default (I think). sacct returns information from the accounting database, so works for all jobs.