pythonpython-3.xhtmlextractself-extracting

Python reads data from webpages


I have a list of a bunch of IP addresses. I am wondering if it is possible to use python to determine the country name of the IP addresses by extracting the information from this website (http://www.whatip.com/ip-lookup). Please see the screenshot below. e.g: IPlist = ["100.43.90.10","125.7.8.9.9"]

Here is my code: I understand i could change the search url by concatenating the actual url with the suffix (=my IP address). And I want to get "United States"

Here is the screenshot of where "United States" is located: enter image description here

    import urllib.request
    with urllib.request.urlopen('http://www.whatip.com/ip/100.43.90.10') as response:
        html = response.read()
        print (html)
        text = html.decode()                

        start = text.find("<td>Country:</td>")

I checked there is only one "Country" in the source code. I understand that I need to find the index of "Country", and then print out "United States" but I got stuck. Anyone plz tell me how to do it? Many thanks!!


Solution

  • I would suggest using one of the many REST APIs available for IP geolocation.

    This doesn't require you to install any new modules or perform any web page scraping. The request returns a json object that you can use the inbuilt module to parse and immediately create a python dictionary.

    I had a quick play with nekudo and it appear to work well:

    import json
    from http import client
    
    # Connect to the client
    conn = client.HTTPConnection("geoip.nekudo.com")
    
    # Make the request and extract the data
    conn.request("GET","/api/172.217.3.110/full")
    json_data = conn.getresponse().read().decode()
    
    # Convert the JSON to a Python object
    data = json.loads(json_data)
    

    data is now a Python dictionary containing all the information you need

    >>> data['registered_country']['names']['en']
    'United States'
    
    >>> data['location']
    {'latitude': 37.4192, 'metro_code': 807, 'time_zone': 'America/Los_Angeles', 'longitude': -122.0574}