pythonhtmlurlopen

Python check if website exists


I wanted to check if a certain website exists, this is what I'm doing:

user_agent = 'Mozilla/20.0.1 (compatible; MSIE 5.5; Windows NT)'
headers = { 'User-Agent':user_agent }
link = "http://www.abc.com"
req = urllib2.Request(link, headers = headers)
page = urllib2.urlopen(req).read() - ERROR 402 generated here!

If the page doesn't exist (error 402, or whatever other errors), what can I do in the page = ... line to make sure that the page I'm reading does exit?


Solution

  • You can use HEAD request instead of GET. It will only download the header, but not the content. Then you can check the response status from the headers.

    For python 2.7.x, you can use httplib:

    import httplib
    c = httplib.HTTPConnection('www.example.com')
    c.request("HEAD", '')
    if c.getresponse().status == 200:
       print('web site exists')
    

    or urllib2:

    import urllib2
    try:
        urllib2.urlopen('http://www.example.com/some_page')
    except urllib2.HTTPError, e:
        print(e.code)
    except urllib2.URLError, e:
        print(e.args)
    

    or for 2.7 and 3.x, you can install requests

    import requests
    response = requests.get('http://www.example.com')
    if response.status_code == 200:
        print('Web site exists')
    else:
        print('Web site does not exist')