phpcurlcurl-multi

Is it possible to get the origin URL when using a CURL multi execute in PHP with follow location?


We are making a script where links in our system are checked for invalid statuscodes. For example, someone creates a page on our website, fills it with links, but after many years, some urls are not correct (4xx/5xx status codes).

Do check if a link is valid, I've made a curl snippet to get the statuscode. Because of the amount of links, I use curl_multi_exec to run asynchronous.

But, now I have a problem. If a url has status code 3xx, it means there is a redirect. In that case, I've to follow to the "real" url and get that statuscode. PHP has a curl option for that: CURLOPT_FOLLOWLOCATION.

Here is the problem: When there is a redirect, the server returns the correct statuscode, with the incorrect URL. We have to update the statuscode of the "origin" URL with the statuscode of the "destination" URL.

For example: Let's say http://example.com redirects to https://example.com In that case, we receive the statuscode of https://example.com, but we have to update that statuscode on the record of http://example.com.

Here are the snippets I made:

// CURL Options
$options = array(
    CURLOPT_RETURNTRANSFER => true,
    CURLOPT_HEADER         => true,
    CURLOPT_FOLLOWLOCATION => true,
    CURLOPT_ENCODING       => "",
    CURLOPT_AUTOREFERER    => true,
    CURLOPT_CONNECTTIMEOUT => 10,
    CURLOPT_TIMEOUT        => 10,
    CURLOPT_NOBODY         => true
);

// Init CURL Multi
$mh = curl_multi_init();

To add a URL:

$ch = curl_init(trim($RowFromDatabase->Url));
curl_setopt_array($ch, $options);
curl_multi_add_handle($mh, $ch);

And here is where I run all checks:

do {
    // Run all URL's
    while(($exec = curl_multi_exec($mh, $running)) == CURLM_CALL_MULTI_PERFORM);
    if($exec != CURLM_OK) {
        break;
    }

    // Get info about URL's
    while($ch = curl_multi_info_read($mh)) {
        $ch = $ch['handle'];

        $httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
        $info = curl_getinfo($ch);

        // URL (this is the destination URL, I would like to get the origin URL here)
        $url = $info['url'];

        $broken = false;

        if($httpCode >= 400){
            $broken = true;
        }

        if($broken){
            // Update broken in database
            $QueryBroken->bind_param("s",$url);
            $QueryBroken->execute();
        }

        // Handle
        curl_multi_remove_handle($mh, $ch);
        curl_close($ch);                
    }
} while($running);
curl_multi_close($mh);

So, basically: I would like to receive the origin url instead of the destination url. Is that possible?


Solution

  • You need to ask CURL to return the headers using CURLOPT_RETURNTRANSFER and look for the redirect instruction yourself. This is described here:

    http://zzz.rezo.net/HowTo-Expand-Short-URLs.html