pythonweb-scrapingbeautifulsoup

403 Forbidden Error when scraping a site, user-agents already used and updated. Any ideas?


As the title above states I am getting a 403 error. The URLs generated are valid, I can print them and then open them in my browser just fine.

I've got a user agent, it's the exact same one that my browser sends when accessing the page I want to scrape pulled straight from chrome devtools. I've tried using sessions instead of a straight request, I've tried using urllib, and I've tried using a generic request.get.

Here's the code I'm using, that 403s. Same result with request.get etc.

headers = {'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.157 Safari/537.36'}

session = requests.Session()
req = session.get(URL, headers=headers)

So yeah, I assume I'm not creating the useragent write so it can tell I am scraping. But I'm not sure what I'm missing, or how to find that out.


Solution

  • I got all headers from DevTools and I started removing headers one by one and I found it needs only Accept-Language and it doesn't need User-Agent and it doesn't need Session.

    import requests
    
    url = 'https://www.g2a.com/lucene/search/filter?&search=The+Elder+Scrolls+V:+Skyrim&currency=nzd&cc=NZD'
    
    headers = {
        'Accept-Language': 'en-US;q=0.7,en;q=0.3',
    }
    
    r = requests.get(url, headers=headers)
    
    data = r.json()
    
    print(data['docs'][0]['name'])
    

    Result:

    The Elder Scrolls V: Skyrim Special Edition Steam Key GLOBAL