Lets say I have a loop in Bash:
for foo in `some-command`
do
do-something $foo
done
do-something
is cpu bound and I have a nice shiny 4 core processor. I'd like to be able to run up to 4 do-something
's at once.
The naive approach seems to be:
for foo in `some-command`
do
do-something $foo &
done
This will run all do-something
s at once, but there are a couple downsides, mainly that do-something may also have some significant I/O which performing all at once might slow down a bit. The other problem is that this code block returns immediately, so no way to do other work when all the do-something
s are finished.
How would you write this loop so there are always X do-something
s running at once?
Depending on what you want to do xargs also can help (here: converting documents with pdf2ps):
cpus=$( ls -d /sys/devices/system/cpu/cpu[[:digit:]]* | wc -w )
find . -name \*.pdf | xargs --max-args=1 --max-procs=$cpus pdf2ps
From the docs:
--max-procs=max-procs
-P max-procs
Run up to max-procs processes at a time; the default is 1.
If max-procs is 0, xargs will run as many processes as possible at a
time. Use the -n option with -P; otherwise chances are that only one
exec will be done.