I'm using aiohttp to asynchronously scrap some price from an url. Before, I used requests.get to synchronously do the same. I am able successfully able to scrap using requests.get but the same URL throws 403 forbidden error when I'm trying to do it using aiohttp. I try to find what could be the issue but I haven't got any success so far. The URL is important because that site's URL are getting this 403 error.
I tried to disable the behavior that aiohttp normalize url using yarl.URL with encoded=True but it still don't work...
import requests
import asyncio
from aiohttp import ClientSession
import yarl
url = 'https://www.yescapa.fr/s?seatbelts=4&beds=4&km_unlimited=true&less_than_five=true&cooking=true&sink=true&fridge=true&wc=true&heating=true&types=4&longitude=-0.58046&latitude=44.84135&radius=50000&date_from=2024-08-01&date_to=2024-08-29&page=1'
res = requests.get(url)
print(res.status_code) # getting a 200 RESPONSE
async def test(url):
async with ClientSession() as session:
url = yarl.URL(url, encoded=True)
async with session.request(method="GET", url=url) as response:
return response.status # getting à 403 RESPONSE
print(asyncio.run(test(url)))
What am i doing wrong ?
I hope I get the solution. Thanks.
Looks like those two (requests
and aiohttp
) use different headers. If i copy headers from the successful request, it works:
import requests
import asyncio
from aiohttp import ClientSession
import yarl
url = 'https://www.yescapa.fr/s?seatbelts=4&beds=4&km_unlimited=true&less_than_five=true&cooking=true&sink=true&fridge=true&wc=true&heating=true&types=4&longitude=-0.58046&latitude=44.84135&radius=50000&date_from=2024-08-01&date_to=2024-08-29&page=1'
res = requests.get(url)
print(res.status_code) # getting a 200 RESPONSE
headers = {
'User-Agent': res.request.headers['User-Agent'],
'Accept': res.request.headers['Accept'],
'Accept-Encoding': res.request.headers['Accept-Encoding'],
'Connection': 'keep-alive',
}
async def test(url):
async with ClientSession(headers=headers) as session:
url = yarl.URL(url, encoded=True)
async with session.request(method="GET", url=url) as response:
return response.status # getting a 200 RESPONSE now as well
print(asyncio.run(test(url)))