phprestcurlparallel-processingcurl-multi

Count how many curl_multi requests have been made


I want to know when each curl request has been made when using curl_multi.

My code so far:

$i = 0;
do {
    $status = curl_multi_exec($mh, $running);
    if($running){
        curl_multi_select($mh);
    }
    $info = curl_multi_info_read($mh);
    if ($info !== false) {
        $i++;
        if ($i % 10 === 0 && $i>0)
            logTimeElapsed();
    }
} while ($running && $status == CURLM_OK);

I'm copying code mostly from here.

My tests of hit has been strangly inconsistent. It seems to be working only sometimes. Perhaps when I make a small number of requests. So the issue might actually be to do with the end point. If that is the case, how do I keep track of whether the end point has refused my request?


Solution

  • My tests of hit has been strangly inconsistent

    yeah, no wonder. there's a bug in your code if several handles finish in the same exec() call, you will only read the message from the first of them, and the messages from the rest of the handles will be lost! (edit: but it's not your fault, the bug comes from your copypasta at https://www.php.net/manual/en/function.curl-multi-info-read.php , someone should fix the docs!) (edit2: fixed the docs: https://github.com/php/doc-en/pull/102 ) lets say that before

        $status = curl_multi_exec($mh, $running);
    

    $running is 2, and after it has finished, $running is 0, then

    $info = curl_multi_info_read($mh);
    if ($info !== false) {
        $i++;
        if ($i % 10 === 0 && $i>0)
            logTimeElapsed();
    }
    

    will only read the first message, and the message from the 2nd handle will be ignored forever! you need to read all the message, no matter if it's 1 message or 100 messages, to do that use while()

    while(false!==($info=curl_multi_info_read($mh))){
    

    then all messages will be read. also you should probably remove completed handles from $mh, so add

    while(false!==($info=curl_multi_info_read($mh))){
        curl_multi_remove_handle($mh,$info['handle']);
    

    that way we ensure curl_multi_exec() won't try to re-run completed handles.. also there will be less job/cpu for multi_exec as it has a smaller list to iterate

    also as a small optimization you should curl_multi_select() after reading the messages, not before reading them, then your cpu will be busy handling the messages while waiting for network activity, rather than sleeping until there is network activity, and THEN reading the messages from the previous exec(), put simply, the code should be faster if you just put select() after info_read(), not before.