phpregex

Matching URLs against array of shortening services


Consider the following list of URLs:

1 http://www.cnn.com/international/stories/423423532
2 http://www.traderscreener.com/blah
3 http://is. gd/fsdaGdfd3
4 http://bit. ly/54HFD
5 http://stackoverflow.com/question/ask

I would like to expand shortened URLs to their original form:

$headers = get_headers($URL, 1);
if (!empty($headers['Location'])) {
  $headers['Location'] = (array) $headers['Location'];
  $URL = array_pop($headers['Location']);
}

However, I need to match all URLs against an array of shortening services:

$array(
  'is.gd', 'bit.ly', 'wibi.us', 'tinyurl.com' // etc
)

In this case, this would have to filter out URLs 3, 4, and 5. I believe the most easy way of doing this would be to grab *** in http://***/blah. Since I have little experience using regex, what would be the regex needed? Or is there a better way of approaching this?


Solution

  • By far the easiest way to do this is not to build a blacklist. Instead, query the URL and see if it redirects. Send a HEAD request, and look for the status code. If it's 3xx, then there's a redirect so you should look for the "Location" header and use that as the new URL.