[SOLVED] Redirect from doi.org to onlinelibrary.wiley.com fails

Redirect from doi.org to onlinelibrary.wiley.com fails

If I visit https://doi.org/10.1002/jccs.200600142 with my browser, everything is fine. But both requests fail:

python -c "import requests; print(requests.head('https://doi.org/10.1002/jccs.200600142', allow_redirects=True))"

<Response [403]>

I also tried accepting cookies and changing the user-agent, which also did not help:

import requests
with requests.Session() as s:
    print(s.get('https://doi.org/10.1002/jccs.200600142', allow_redirects=True, headers={'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/114.0'}))

<Response [403]>

Does someone know what requests does differently than Firefox? Or should I include more headers?

Solution

I had the same problem with accessing doi.org URLs.

Required Request Headers

Finally I discovered that it needed additional HTTP headers to prevent the server from forbidding it. Specifically, all 3 of

Accept-Language,
Sec-Fetch-Site, and
User-Agent

must be there or it will give you a 403 status code.

Working Example

import requests
h = {}
h["Accept-Language"] = "en-US"
h["Sec-Fetch-Site"] = "cross-site"
h["User-Agent"] = "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:138.0) Gecko/20100101 Firefox/138.0"
url = "https://doi.org/10.1111/tgis.70037"
response = requests.get(url, headers=h)
response.status_code  # 200