I LIVE IN GREECE / I HAVE A GREEK IP
I'm trying to web scrape a website using Python and the requests library, but I've noticed that the requests connect to a US server instead of a Greek one. Additionally, the content I get is not in Greek.
I've set the headers and user-agent to mimic a Greek user, but it doesn't seem to have any effect. Here's the Python script I'm using:
import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin
import time
import firebase_admin
from firebase_admin import credentials, db
import asyncio
# Initialize Firebase with your credentials JSON file and database URL
cred = credentials.Certificate("C:\\Users\\alexl\\OneDrive\\Desktop\\Cook Group\\Scripts\\SNS\\creds.json")
firebase_admin.initialize_app(cred, {
'databaseURL': 'https://sns-database-default-rtdb.europe-west1.firebasedatabase.app/'
})
# Function to load data from Firebase
# (snipped for brevity)
# Function to save data to Firebase
# (snipped for brevity)
# Your bot should be running inside an async function
async def main():
while True:
print("Refreshing data...") # Debug message
# Specify the URL you want to scrape
url = "https://www.sneakersnstuff.com/en/176/nike-dunk"
headers = {
'accept-language': 'en-GR-0-0',
"sns.state": "en-GR-0-0",
"Cookie": "sns.state=en-GR-0-0",
'user-agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36"
}
# Load the previously scraped data from Firebase
# (snipped for brevity)
# Send an HTTP GET request to the URL
response = requests.get(url, headers=headers)
# Check if the request was successful (status code 200)
# (snipped for brevity)
print("Waiting for the next refresh...") # Debug message
# Wait for 60 seconds before the next refresh
await asyncio.sleep(60)
# Ensure that the main function is run
if __name__ == "__main__":
asyncio.run(main())
Here is websites cookies:
What should I do?
Set cookie PreferredRegion
:
import requests
from bs4 import BeautifulSoup
url = "https://www.sneakersnstuff.com/en/176/nike-dunk"
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/119.0"
}
cookies = {"PreferredRegion": "2358"}
soup = BeautifulSoup(
requests.get(url, headers=headers, cookies=cookies).content, "html.parser"
)
region = soup.select_one('h3:-soup-contains("Region") + a').text
print(region)
Prints:
GR:en