Looking at the PHP docs for get_headers()...
array get_headers ( string $url [, int $format = 0 ] )
... there are two ways to run it:
format === 0
)$headers = get_headers($url);
// or
$headers = get_headers($url, 0);
format !== 0
)$headers = get_headers($url, 1);
The difference between the two being whether the arrays are numerically indexed (first case)...
(excerpt from docs)
Array
(
[0] => HTTP/1.1 200 OK
[1] => Date: Sat, 29 May 2004 12:28:13 GMT
[2] => Server: Apache/1.3.27 (Unix) (Red-Hat/Linux)
... etc
... or indexed with keys (second case)...
(excerpt from docs)
Array
(
[0] => HTTP/1.1 200 OK
[Date] => Sat, 29 May 2004 12:28:14 GMT
[Server] => Apache/1.3.27 (Unix) (Red-Hat/Linux)
[Last-Modified] => Wed, 08 Jan 2003 23:11:55 GMT
... etc
In the example given in the docs, the http status code belongs to a numerical index...
[0] => HTTP/1.1 200 OK
... regardless of what format
is set to.
Similarly, in every valid URL that I have ever put through get_headers
(i.e. many URLs), the status codes have always been under numerical indexes, even when multiple status codes present...
// Output from JSON.stringify(get_headers($url, 1))
{
"0": "HTTP/1.1 301 Moved Permanently",
"1": "HTTP/1.1 200 OK",
"Date": [
"Thu, 11 Aug 2016 07:12:28 GMT",
"Thu, 11 Aug 2016 07:12:28 GMT"
],
"Content-Type": [
"text/html; charset=iso-8859-1",
"text/html; charset=UTF-8"
]
... etc
But, I have not (read: cannot) test every URL on every type of server, and so cannot speak in absolutes about the status code indexes.
Is it possible that get_headers($url, 1)
could return a non-numerical http status code index? Or is it hard-coded into the function to always return the status codes under numerical indices - no matter what?
Extra reading, not necessary or essential to the question above...
For the curious, my question is mostly to do with optimization. get_headers()
is already painfully slow - even when sending a HEAD request instead of GET - and only gets worse after combing through the return array with a preg_match
and regex.
(The various CURL methods you'll find are even slower, I've tested them against get_headers()
with very long lists of URLs, so holster that hip-shot, partner)
If I know that the status codes are always numerically indexed, then I can speed my code up a bit, by ignoring all non-integer indices, before running them through the preg_match
. The difference for one URL might only be fractions of a second, but when running this function all day, every day, those little bits add up.
Additionally (Edit #1)
I'm currently only worried about the final http status code (and URL), after all redirects. I was using a method similar to this to get the final URL.
It seems that after running
$headers = array_reverse($headers);
then the final status code after the redirects will always be in $headers[0]
. But, once again, this only is a sure-thing if the status codes are numerically indexed.
The PHP C source code for that function looks like this:
if (!format) {
no_name_header:
add_next_index_str(return_value, zend_string_copy(Z_STR_P(hdr)));
} else {
char c;
char *s, *p;
if ((p = strchr(Z_STRVAL_P(hdr), ':'))) {
... omitted ...
} else {
goto no_name_header;
}
}
In other words, it tests if there's a :
in the header, and if so proceeds to index it by its name (omitted here). If there's no :
or if you did not request to $format
the result, no_name_header
kicks in and it adds it to the return_value
without explicit index.
So, yes, the status lines should always be numerically indexed. Unless the server puts a :
into the status line, which would be unusual. Note that RFC 2616 does not explicitly prohibit the use of :
in the reason phrase part of the status line:
Status-Line = HTTP-Version SP Status-Code SP Reason-Phrase CRLF
Reason-Phrase = *<TEXT, excluding CR, LF>
TEXT = <any OCTET except CTLs,
but including LWS>
There is no standardised reason phrase which contains a ":", but you never know, you may encounter exotic servers in the wild which defy convention here…