When I access the URL https://www.getfpv.com/media/sitemap.xml from my browser it works, but when I try to do it with Python, it returns 403 forbidden. How does the website know that it's python making the request instead of my browser? I copied all of the headers so the request should be identical. It's not javascript or cookies because when I turned those off on Safari it still worked. My code is below.
url = "https://www.getfpv.com/media/sitemap.xml"
headers = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.9',
'Priority': 'u=0, i',
'Sec-Fetch-Dest': 'document',
'Sec-Fetch-Mode': 'navigate',
'Sec-Fetch-Site': 'none',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/133.0.0.0 Safari/537.36'
}
r = requests.get(url, headers=headers)
r
<Response [403]>
Looks like it is filtering for user agents. I am able to get it via:
import requests
sess = requests.Session()
sess.headers['User-Agent'] = (
'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:135.0) '
'Gecko/20100101 Firefox/135.0'
)
res = sess.get('https://www.getfpv.com/media/sitemap.xml')
res
# returns:
<Response [200]>
Behaviour seems to vary depending on the client origin.