phpcurlguzzle

Is there a way to send a HEAD and GET request using php?


I would like to obtain the headers of a resource without actually downloading it, especially because I am trying to inspect headers of bigger media files. However the URLs are behind redirects, and I need to follow redirects to determine the actual headers of the media. I am not sure how many redirect are in place, and this can vary per URL.

In below answer it explains how to obtain the headers using POST and HEAD request:

curl -s -I -X POST http://www.google.com

https://stackoverflow.com/a/38679650

This works for my use case (using GET instead of POST), as I can obtain the headers such as the next redirect location without actually downloading the media. Then I can do this recursively following redirect, until I get the headers of the actual media.

However I have no idea how to perform BOTH a HEAD and GET request using php. Is this possible using some library such as guzzle?


Solution

  • One possibility is to abort the GET request once you have received the header(s) you need. Example:

    $url = "http://www.example.com/";
    
    $ch = curl_init($url);
    
    curl_setopt_array($ch, array(
        CURLOPT_FOLLOWLOCATION => true,
        CURLOPT_HEADER => true,
        CURLINFO_HEADER_OUT => true,
        CURLOPT_HTTPGET => true,
        CURLOPT_RETURNTRANSFER => true,
        CURLOPT_HEADERFUNCTION => 'requestHeaderCallback',
    ));
    
    $curlResult = curl_exec($ch);
    
    curl_close($ch);
    
    function requestHeaderCallback($ch, $header) {
        $matches = array();
        if (preg_match("/^HTTP/\d.\d (\d{3}) /")) {
            if ($matches[1] < 300 || $matches[1] >= 400) {
                return 0;
            }
        }
        return strlen($header);
    }
    

    See also Is it ok to terminate a HTTP request in the callback function set by CURLOPT_HEADERFUNCTION?