I have a question about parallel processing in shell scripting. I have a program my
Program
, which I wish to run multiple times, in a loop within a loop. This program is basically this:
MYPATHDIR=`ls $MYPATH`
for SUBDIRS in $MYPATHDIR; do
SUBDIR_FILES=`ls $MYPATH/$SUBDIRS`
for SUBSUBDIRS in $SUBDIR_FILES; do
find $MYPATH/$SUBDIRS/$SUBSUBDIRS | ./myProgram $MYPATH/$SUBDIRS/outputfile.dat
done
done
What I wish to do is to take advantage of parallel processing. So I tried this for the middle line to start all the myPrograms
at once:
(find $MYPATH/$SUBDIRS/$SUBSUBDIRS | ./myProgram $MYPATH/$SUBDIRS/outputfile.dat &)
However, this began all 300 or so calls to myProgram
simultaneously, causing RAM issues etc.
What I would like to do is to run each occurrence of myProgram
in the inner loop in parallel, but wait for all of these to finish before moving on to the next outer loop iteration. Based on the answers to this question, I tried the following:
for SUBDIRS in $MYPATHDIR; do
SUBDIR_FILES=`ls $MYPATH/$SUBDIRS`
for SUBSUBDIRS in $SUBDIR_FILES; do
(find $MYPATH/$SUBDIRS/$SUBSUBDIRS | ./myProgram $MYPATH/$SUBDIRS/outputfile.dat &)
done
wait $(pgrep myProgram)
done
But I got the following warning/error, repeated multiple times:
./myScript.sh: line 30: wait: pid 1133 is not a child of this shell
...and all the myPrograms
were started at once, as before.
What am I doing wrong? What can I do to achieve my aims? Thanks.
()
invokes a subshell, which then invokes find/myprogram, so you're dealing with "grandchildren" processes. You can't wait on grandchildren, only direct descendants (aka children).