pythonpython-requests

Redirect from doi.org to onlinelibrary.wiley.com fails


If I visit https://doi.org/10.1002/jccs.200600142 with my browser, everything is fine. But both requests fail:

python -c "import requests; print(requests.head('https://doi.org/10.1002/jccs.200600142', allow_redirects=True))"

<Response [403]>

I also tried accepting cookies and changing the user-agent, which also did not help:

import requests
with requests.Session() as s:
    print(s.get('https://doi.org/10.1002/jccs.200600142', allow_redirects=True, headers={'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/114.0'}))

<Response [403]>

Does someone know what requests does differently than Firefox? Or should I include more headers?


Solution

  • I had the same problem with accessing doi.org URLs.

    Required Request Headers

    Finally I discovered that it needed additional HTTP headers to prevent the server from forbidding it. Specifically, all 3 of

    must be there or it will give you a 403 status code.

    Working Example

    import requests
    h = {}
    h["Accept-Language"] = "en-US"
    h["Sec-Fetch-Site"] = "cross-site"
    h["User-Agent"] = "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:138.0) Gecko/20100101 Firefox/138.0"
    url = "https://doi.org/10.1111/tgis.70037"
    response = requests.get(url, headers=h)
    response.status_code  # 200