jobsslurm

Slurm command to see node ID


I have a program that crashes for unknown reasons on a cluster. I have the feeling that it could be something that has to do with the use of a specific node(s). Is there a command to see on which nodes of the cluster a completed job has been running (I mean the node ID)? I would like to check if by any chance the job is run always on the same nodes.


Solution

  • The sacct command can be used to query the accounting database:

    sacct --start=2024-10-01 --format jobid,state,nodelist
    

    With the --format, you can specify the columns that you want to see. The --start allows looking at past jobs (by default, sacct will only show jobs for the current day)