As all of you know when you fork the child gets a copy of everything, including file and network descriptors - man fork
.
In PHP, when you use pcntl_fork all of your connections created with mysql_connect are copied and this is somewhat of a problem - php docs and SO question. Common sense in this situation says close the parent connection, create new and let the child use the old one. But what if said parent needs create many children ever few seconds? In that case you end up creating loads of new connections - one for every bunch of forks.
What does that mean in code:
while (42) {
$db = mysql_connect($host, $user, $pass);
// do some stuff with $db
// ...
foreach ($jobs as $job) {
if (($pid = pcntl_fork()) == -1) {
continue;
} else if ($pid) {
continue;
}
fork_for_job($job);
}
mysql_close($db);
wait_children();
sleep(5);
}
function fork_for_job($job) {
// do something.
// does not use the global $db
// ...
exit(0);
}
Well, I do not want to do that - thats way too many connections to the database. Ideally I would want to be able to achieve behaviour similar to this one:
$db = mysql_connect($host, $user, $pass);
while (42) {
// do some stuff with $db
// ...
foreach ($jobs as $job) {
if (($pid = pcntl_fork()) == -1) {
continue;
} else if ($pid) {
continue;
}
fork_for_job($job);
}
wait_children();
sleep(5);
}
function fork_for_job($job) {
// do something
// does not use the global $db
// ...
exit(0);
}
Do you think it is possible?
Some other things:
The only thing you could try, is to let your children wait until each other child has finished its job. This way you could use the same database connection (provided there aren't any synchronization issues). But of course you'll have a lot of processes, which is not very good too (in my experience PHP has quite a big memory usage). If having multiple processes accessing the same database connection is not a problem, you could try to make "groups" of processes which share a connection. So you don't have to wait until each job finished (you can clean up when the whole group finished) and you don't have a lot of connections either..
You should ask yourself whether you really need a database connection for your worker processes. Why not let the parent fetch the data and write your results to a file?
If you do need the connection, you should consider using another language for the job. PHPs cli itself is not a "typical" use case (it was added in 4.3) and multiprocessing is more of a hack than a supported feature.