Here's the URL: https://www.grammarly.com
I'm trying to fetch HTTP headers by using the native get_headers()
function:
$headers = get_headers('https://www.grammarly.com')
The result is
HTTP/1.1 400 Bad Request
Date: Fri, 27 Apr 2018 12:32:34 GMT
Content-Type: text/plain; charset=UTF-8
Content-Length: 52
Connection: close
But, if I do the same with the curl
command line tool, the result will be different:
curl -sI https://www.grammarly.com/
HTTP/1.1 200 OK
Date: Fri, 27 Apr 2018 12:54:47 GMT
Content-Type: text/html; charset=UTF-8
Content-Length: 25130
Connection: keep-alive
What is the reason for this difference in responses? Is it some kind of poorly implemented security feature on Grammarly's server-side or something else?
It is because get_headers()
uses the default stream context, which basically means that almost no HTTP headers are sent to the URL, which most remote servers will be fussy about. Usually the missing header most likely to cause issues is the User-Agent. You can set it manually before calling get_headers()
using stream_context_set_default
. Here's an example that works for me:
$headers = get_headers('https://www.grammarly.com');
print_r($headers);
// has [0] => HTTP/1.1 400 Bad Request
stream_context_set_default(
array(
'http' => array(
'user_agent'=>"php/testing"
),
)
);
$headers = get_headers('https://www.grammarly.com');
print_r($headers);
// has [0] => HTTP/1.1 200 OK