pythonlsfembarrassingly-parallel

What does jug status 'Active' mean, and why does it not equal the number of procs requested?


I've been unable to find what status 'Active' tasks are. I'm using JUG 2.1.1, and I don't see that word appear anywhere in the manual, except in a footnote about 'active-wait'.

I'm using an LSF array to run a large number (hundreds of thousands) of minutes-long single core jobs. Peculiarly, although jobs do move from 'Ready' to 'Complete', and none are listed as 'Failed' or 'Waiting', I have no column in the output from status for 'Running' (which I've seen in the worked examples) and instead have a column called 'Active'. The number of active tasks varies, but is between 800 and 950 for an LSF array with 2000 elements. According to LSF (output of bjobs -r), each of the elements in the job array shows status 'RUN'. Although I have not done it exhaustively, manually sshing to a node some of my jobs have landed on and then running 'htop' to look at utilization shows the expected number of processes, each pinning an available core. It is conceivable that there are some processes in my job array that are not doing this, however, since what I did amounts to a spot-check.

Does Running == Active for the output of jug status? Am I failing to use about 1100 processors that I am nonetheless occupying with nominally single-threaded jobs?

Thanks for the input. Happy to provide more details as needed.


Solution

  • (author of jug here): It does mean "jobs running right now".

    If you are using the file backend, and are running 1,000s of jobs simultaneously, it may just be that the counting is not syncing properly: as jug status is working, some jobs may be running, but it does not see them as running because between the moment it starts listing the locks and going through the list of jobs, they have finished and others started. Also, the listing of locks can be out of sync on a network filesystem (it should not matter for actually creating locks, but that process is much slower and we do not wish to pay the cost for jug status).

    This should be much less serious with the redis backend, btw.