We are making a script where links in our system are checked for invalid statuscodes. For example, someone creates a page on our website, fills it with links, but after many years, some urls are not correct (4xx/5xx status codes).
Do check if a link is valid, I've made a curl snippet to get the statuscode. Because of the amount of links, I use curl_multi_exec to run asynchronous.
But, now I have a problem. If a url has status code 3xx, it means there is a redirect. In that case, I've to follow to the "real" url and get that statuscode. PHP has a curl option for that: CURLOPT_FOLLOWLOCATION
.
Here is the problem: When there is a redirect, the server returns the correct statuscode, with the incorrect URL. We have to update the statuscode of the "origin" URL with the statuscode of the "destination" URL.
For example: Let's say http://example.com redirects to https://example.com In that case, we receive the statuscode of https://example.com, but we have to update that statuscode on the record of http://example.com.
Here are the snippets I made:
// CURL Options
$options = array(
CURLOPT_RETURNTRANSFER => true,
CURLOPT_HEADER => true,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_ENCODING => "",
CURLOPT_AUTOREFERER => true,
CURLOPT_CONNECTTIMEOUT => 10,
CURLOPT_TIMEOUT => 10,
CURLOPT_NOBODY => true
);
// Init CURL Multi
$mh = curl_multi_init();
To add a URL:
$ch = curl_init(trim($RowFromDatabase->Url));
curl_setopt_array($ch, $options);
curl_multi_add_handle($mh, $ch);
And here is where I run all checks:
do {
// Run all URL's
while(($exec = curl_multi_exec($mh, $running)) == CURLM_CALL_MULTI_PERFORM);
if($exec != CURLM_OK) {
break;
}
// Get info about URL's
while($ch = curl_multi_info_read($mh)) {
$ch = $ch['handle'];
$httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
$info = curl_getinfo($ch);
// URL (this is the destination URL, I would like to get the origin URL here)
$url = $info['url'];
$broken = false;
if($httpCode >= 400){
$broken = true;
}
if($broken){
// Update broken in database
$QueryBroken->bind_param("s",$url);
$QueryBroken->execute();
}
// Handle
curl_multi_remove_handle($mh, $ch);
curl_close($ch);
}
} while($running);
curl_multi_close($mh);
So, basically: I would like to receive the origin url instead of the destination url. Is that possible?
You need to ask CURL to return the headers using CURLOPT_RETURNTRANSFER and look for the redirect instruction yourself. This is described here: