pythonweb-scrapinghttp-status-code-404

Python 3.2: Battle.net scraper gives 404 on certain URLs


I'm writing a battle.net screen scraper in python, and I want to scrape this page.

Problem is, I get a 404 when I try to download it using my script. However, viewing it in a web browser works just fine.

Here is the code I'm using if it helps (requests needed):

def download(url, max_retries=10):
    for i in range(max_retries):
        print('Downloading: ' + url)
        r = requests.get(url)

        print('Status code: ' + str(r.status_code))

        if r.status_code == requests.codes.ok: return r.content
    return None

download('http://us.battle.net/sc2/en/game/unit')

Thanks for any answers.


Solution

  • Try this. Apparently the ending slash is necessary.

    def download(url, max_retries=10):
        for i in range(max_retries):
            print('Downloading: ' + url)
            r = requests.get(url)
    
            print('Status code: ' + str(r.status_code))
    
            if r.status_code == requests.codes.ok: return r.content
        return None
    
    download('http://us.battle.net/sc2/en/game/unit/')