Summary
I have some PHP 5.4 code which fetches a batch of Facebook/Instagram photos in parallel using multi curl. This code has been working for years, and nothing has changed as far as I can tell.
I add multiple curl requests to a 'multi' request. Each curl request gets a CURLOPT_TIMEOUT
. The problem I'm seeing is that, all of a sudden, some of my requests don't complete until this timeout is reached (no matter what timeout I set).
Code
I do something like this (simplified):
do {
while (CURLM_CALL_MULTI_PERFORM === curl_multi_exec($mh, $running));
// Wait for activity on any curl-connection (optional, reduces CPU)
curl_multi_select($mh);
// a request was just completed -- find out which one
while($done = curl_multi_info_read($mh))
{
$completedCurlRequest = $done['handle'];
//save the file
do_some_work(completedCurlRequest);
curl_multi_remove_handle($mh, $completedCurlRequest);
}
} while ($running);
I use this script to run batches of about 40 parallel requests to fetch some images (from Facebook). Most of them take about 500ms to complete. However, a few of the requests "hang" (until the CURLOPT_TIMEOUT
) before they arrive.
Basically the curl_multi_select
step takes the entire timeout. Or, if I remove that curl_multi_select
line, the outer loop spins (burning CPU) until the timeout.
Considerations
It doesn't matter what the timeout is - if I set the timeout to 30s, they arrive after 30 seconds, If I set the timeout to 1s, they arrive after 1s!
This is a really sudden change that does not correlate with any code release - it was all working fine up until 30th Jan 2019, but on the 31st it suddenly stopped working.
This isn't easy to reproduce, as it only affects an image once. If I repeat it for a batch of images I already fetched, it works fine the next time round.
It affects both Facebook and Instagram images, so I think the issue must be to do with my code or my server (and not Facebook or Instagram), as they wouldn't have both changed something simultaneously.
Questions
Update Here is the what I get back from a slow request when it finally completes:
INFO
"content_type": "image/jpeg",
"http_code": 200,
"header_size": 377,
"request_size": 180,
"total_time": 15.001012, //<----- Total time == CURLOPT_TIMEOUT
"namelookup_time": 0.007149,
"connect_time": 0.12018,
"pretransfer_time": 0.441911,
"size_download": 40714,
"speed_download": 2714,
"download_content_length": -1, //<------Not set
HEADER
HTTP/2 200
content-type: image/jpeg
x-haystack-needlechecksum: 3529661797
timing-allow-origin: *
access-control-allow-origin: *
cache-control: max-age=1209600, no-transform
date: Mon, 04 Feb 2019 14:04:17 GMT
access-control-expose-headers: X-FB-CEC-Video-Limit
It is missing the content-length
header, but that always seems to be the case the first time a file is fetched. Only 1 or 2 of the 50 parallel requests are slow, yet all of the requests are missing their content length headers.
If I fetch the same file again, it is much quicker, and I do see content length being set this time
INFO
"download_content_length": 52721,
HEADER
content-length: 52721
My current theory is that there is a bug in Facebook fileserver that means the connection is sometimes not being closed even though the data has been sent, so the connection stays open until the timeout. In the absence of the (optional) content-length header being sent by Facebook's fileserver, cURL can't know if the payload is complete, and so hangs.
My current solution is to 'prime' the fileserver by first making a request for the image without a body, like this:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_NOBODY, 1);
curl_exec($ch);
This is a pretty quick process, since there is no image being returned. I actually do this in the background using asynchronous multi curl, so I can get on with doing some other processing.
After priming the fileserver, subsequent requests for the files are even quicker than before, as the content-length is known.
This is a bit of a clumsy approach, but in the absence of any response from Facebook so far I'm not sure what else to do.