I have access to exactly 100 nodes. Once I submit jobs to all the 100 nodes, I would like to query after a sleep duration of 180 seconds to find out how many jobs are still in the queue. If there are some jobs in the queue, the output on bash would indicate the number of pending jobs. If all the jobs start running on the 100 nodes, I would like to submit new jobs and do so until all the jobs are finished. When all the jobs are finished, the bash should exit the while loop.
I have written the following lines of bash code
n=1
while [ $n -gt 0 ]; do
if (($(qselect -u username | grep 'Q' | wc -l) > 0)); then
echo "Jobs in Queue=$(qselect -u username | grep 'Q' | wc -l)"
else
python parallel_jobs.py
n=$(qselect -u username | grep 'Q' | wc -l)
fi
sleep 180
done
I find that the bash is exiting the while loop only after 1 pass, contrary to the expectations.
I got it to work like this:
#!/bin/bash
n=1
while [ $n -gt 0 ]; do
if [ $(qstat -u username | grep -c Q) -gt 1 ]; then
echo "Jobs in Queue=$(($(qstat -u username | grep -c Q)-1))"
else
python parallel_jobs.py
n=$(($(qstat -u username | grep -c Q)-1))
fi
done