I would like to obtain the headers of a resource without actually downloading it, especially because I am trying to inspect headers of bigger media files. However the URLs are behind redirects, and I need to follow redirects to determine the actual headers of the media. I am not sure how many redirect are in place, and this can vary per URL.
In below answer it explains how to obtain the headers using POST and HEAD request:
curl -s -I -X POST http://www.google.com
https://stackoverflow.com/a/38679650
This works for my use case (using GET instead of POST), as I can obtain the headers such as the next redirect location without actually downloading the media. Then I can do this recursively following redirect, until I get the headers of the actual media.
However I have no idea how to perform BOTH a HEAD and GET request using php. Is this possible using some library such as guzzle?
One possibility is to abort the GET request once you have received the header(s) you need. Example:
$url = "http://www.example.com/";
$ch = curl_init($url);
curl_setopt_array($ch, array(
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_HEADER => true,
CURLINFO_HEADER_OUT => true,
CURLOPT_HTTPGET => true,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_HEADERFUNCTION => 'requestHeaderCallback',
));
$curlResult = curl_exec($ch);
curl_close($ch);
function requestHeaderCallback($ch, $header) {
$matches = array();
if (preg_match("/^HTTP/\d.\d (\d{3}) /")) {
if ($matches[1] < 300 || $matches[1] >= 400) {
return 0;
}
}
return strlen($header);
}
See also Is it ok to terminate a HTTP request in the callback function set by CURLOPT_HEADERFUNCTION?