pythonweb-scrapingpython-requestswgetuser-agent

How can I use Python's Requests to fake a browser visit a.k.a and generate User Agent?


I want to get the content from this website.

If I use a browser like Firefox or Chrome, I could get the real website page I want, but if I use the Python Requests package (or wget command) to get it, it returns a totally different HTML page.

I thought the developer of the website had made some blocks for this.

How do I fake a browser visit by using Python's Requests or command wget?


Solution

  • Provide a User-Agent header:

    import requests
    
    url = 'http://www.ichangtou.com/#company:data_000008.html'
    headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
    
    response = requests.get(url, headers=headers)
    print(response.content)
    

    FYI, here is a list of User-Agent strings for different browsers:


    As a side note, there is a pretty useful third-party package called fake-useragent that provides a nice abstraction layer over user agents:

    fake-useragent

    Up to date simple useragent faker with real world database

    Demo:

    >>> from fake_useragent import UserAgent
    >>> ua = UserAgent()
    >>> ua.chrome
    u'Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1667.0 Safari/537.36'
    >>> ua.random
    u'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.67 Safari/537.36'