phpcurlheader

How to get header only retrieval in PHP via curl?


Actually I have two questions.

  1. Is there any reduction in processing power or bandwidth used on remote server if I retrieve only headers as opposed to full page retrieval using php and curl?
  2. Since I think, and I might be wrong, that answer to first questions is YES, I am trying to get last modified date or If-Modified-Since header of remote file only in order to compare it with time-date of locally stored data, so I can, in case it has been changed, store it locally. However, my script seems unable to fetch that piece of info, I get NULL, when I run this:
    class last_change {
    
     public last_change;
    
     function set_last_change() {
      $curl = curl_init();
        curl_setopt($curl, CURLOPT_URL, "http://url/file.xml");
        curl_setopt($curl, CURLOPT_HEADER, true);
        curl_setopt($curl, CURLOPT_FILETIME, true);
        curl_setopt($curl, CURLOPT_NOBODY, true);
      // $header = curl_exec($curl);
      $this -> last_change = curl_getinfo($header);
      curl_close($curl);
     }
    
     function get_last_change() {
      return $this -> last_change['datetime']; // I have tested with Last-Modified & If-Modified-Since to no avail
     }
    
    }
    
    In case $header = curl_exec($curl) is uncomented, header data is displayed, even if I haven't requested it and is as follows:
    HTTP/1.1 200 OK
    Date: Fri, 04 Sep 2009 12:15:51 GMT
    Server: Apache/2.2.8 (Linux/SUSE)
    Last-Modified: Thu, 03 Sep 2009 12:46:54 GMT
    ETag: "198054-118c-472abc735ab80"
    Accept-Ranges: bytes
    Content-Length: 4492
    Content-Type: text/xml
    

Based on that, 'Last-Modified' is returned.

So, what am I doing wrong?


Solution

  • You are passing $header to curl_getinfo(). It should be $curl (the curl handle). You can get just the filetime by passing CURLINFO_FILETIME as the second parameter to curl_getinfo(). (Often the filetime is unavailable, in which case it will be reported as -1).

    Your class seems to be wasteful, though, throwing away a lot of information that could be useful. Here's another way it might be done:

    class URIInfo 
    {
        public $info;
        public $header;
        private $url;
    
        public function __construct($url)
        {
            $this->url = $url;
            $this->setData();
        }
    
        public function setData() 
        {
            $curl = curl_init();
            curl_setopt($curl, CURLOPT_URL, $this->url);
            curl_setopt($curl, CURLOPT_FILETIME, true);
            curl_setopt($curl, CURLOPT_NOBODY, true);
            curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
            curl_setopt($curl, CURLOPT_HEADER, true);
            $this->header = curl_exec($curl);
            $this->info = curl_getinfo($curl);
            curl_close($curl);
        }
    
        public function getFiletime() 
        {
            return $this->info['filetime'];
        }
    
        // Other functions can be added to retrieve other information.
    }
    
    $uri_info = new URIInfo('http://www.codinghorror.com/blog/');
    $filetime = $uri_info->getFiletime();
    if ($filetime != -1) {
        echo date('Y-m-d H:i:s', $filetime);
    } else {
        echo 'filetime not available';
    }
    

    Yes, the load will be lighter on the server, since it's only returning only the HTTP header (responding, after all, to a HEAD request). How much lighter will vary greatly.