Recently I've started experiencing issues with a piece of code that has been stable for quite a while. It makes a connection to GameStop to retrieve a page there. Worked fine for years, but is now returning a timeout.
At first I assumed there was some sort of IP or user-agent blocking involved. However, I have spun up brand new machines on both DigitalOcean and Vultr, and both experience the same issue. Although, all the machines are able to use cURL via command line and retrieve the page fine.
Strangely, the code also work on my local development machine, which is a Windows box. So, not sure if there issue is related to PHP running on Linux?
<?php
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,'https://www.gamestop.com/');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLINFO_HEADER_OUT, true);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 0);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
//curl_setopt($ch, CURLOPT_SSLVERSION, 6); -explicitly use TLS v1.2
$html = curl_exec($ch);
$info = curl_getinfo($ch);
$error = curl_error($ch);
curl_close($ch);
echo '<pre>' . var_export($error, true) . '</pre>'
. '<pre>' . var_export($info, true) . '</pre>'
. 'HTML: <textarea>' . $html . '</textarea>';
?>
The above code returns a timeout in any non-local environment I've tried to run it in. In the same environments, the page can be fetched with cURL via command line. I've found some similar questions posted, but most point towards an issue with SSL/TLS version. Have attempted to test this as well (see commented out line), but with same result.
Part of the issue is, I'm not sure there is a real way to debug a timeout coming from a server, as really anything could be causing it. The only real clue I've been going off is that it works on a Windows machine, and on command line in higher environments. Any help or insights would be appreciated!
Edit: Was also able to reproduce the issue on a Windows Server 2016 VM.
most likely it's because curl-cli automatically adds a user-agent header, and libcurl/php does not.
some sort of IP or user-agent blocking involved. However, I have spun up brand new machines on both DigitalOcean and Vultr, and both experience the same issue
setting up VM's on DigitalOcean/Vultr will not automatically make libcurl add user-agent headers to your https requests. that can be done with:
curl_setop($ch,CURLOPT_USERAGENT,"curl/".(curl_version()["version"])); // User-Agent: curl/7.52.1
to mimic curl-cli's user-agent string, or something like
curl_setopt($ch,CURLOPT_USERAGENT,"Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36");
to pretend that you're a Google Chrome version 71, running on Windows 7 x64.
many websites (like, for example, Wikipedia.com ) blocks http requests lacking a User-Agent header.