i'm trying to automate a program to scrap periodically some prices from amazon and other pages. (I'm starting with amazon)
The problem is when i do the soup.find method with PyCharm, it finds his target and returns-it correctly and with the windows terminal it returns: None
I have the code running well from PyCharm, but i need it running from the windows terminal to automate it thought a .bat file.
I find that's a very strange issue and I could't find documentation about it so if any of you could help me with it It would be awesome!
There's some things I've tried so they are discarded.
I've compared the soup it gets with PyCharm and Windows and are different soups, in the Windows one i couldn't find manually the text words.
Finally I'm putting here the code I'm using so you can see what i'm seeing:
import time
import requests
from bs4 import BeautifulSoup
import pandas as pd
import os
from csv import writer
from datetime import date, datetime
from tqdm import tqdm
def r_Amazon(URL):
headers = {
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'es-ES,es;q=0.8',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36'
}
response = requests.get(URL, headers=headers)
soup = BeautifulSoup(response.content, 'lxml')
# Comprobar si el Item esta en Stock o Solo Segunda Mano
get_error = 0
try:
product_price_stat = soup.find('span', {'class': 'a-text-bold'}).text.strip() #<- HERE IT FAILS
if product_price_stat == 'Comprar de segunda mano' or product_price_stat == 'Ofertas destacadas no disponibles':
# El item tiene el precio de 2a MANO, utilizar script correspondiente
try:
product_price = 'ND'
get_error = 1
except:
print('ERROR 2nd TRY')
get_error = 1
else:
# El item tiene el precio NORMAL, utilizar script correspondiente
try:
product_price = soup.find('span', {'class': 'a-offscreen'}).text.strip()
# Format Correctly
product_price = product_price.replace('.', '')
product_price = product_price.replace(',', '.')
product_price = product_price.replace('€', '')
except:
print('ERROR 1rs TRY')
get_error = 1
except:
product_price = 'ND'
print('ERROR')
get_error = 1
return product_price, get_error
It is neither the problem with code or terminal, it is just that amazon is not letting you do scarping because it think that you are a robot(YES even if you use Header most of the time amazom can detect it).
If you try to print the soup in the function (at the time of error you will g et this)
Enter the characters you see below
Sorry, we just need to make sure you're not a robot. For best results, please make sure your browser is accepting cookies.
I find my self in this same mess in past, I recommend you to use Selenium to get the content of webpage instead of request.
This is how you can do it
import time
# import requests
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup
import pandas as pd
import os
from csv import writer
from datetime import date, datetime
from tqdm import tqdm
def r_Amazon(URL):
chrome_options = Options()
chrome_options.add_argument('--headless') # using this so that the browser will open in background
driver = webdriver.Chrome(options=chrome_options)
driver.get(URL)
soup = BeautifulSoup(driver.page_source, 'lxml')
driver.quit()
# Comprobar si el Item esta en Stock o Solo Segunda Mano
get_error = 0
try:
product_price_stat = soup.find('span', {'class': 'a-text-bold'}).text.strip() #<- HERE IT FAILS
if product_price_stat == 'Comprar de segunda mano' or product_price_stat == 'Ofertas destacadas no disponibles':
# El item tiene el precio de 2a MANO, utilizar script correspondiente
try:
product_price = 'ND'
get_error = 1
except:
print('ERROR 2nd TRY')
get_error = 1
else:
# El item tiene el precio NORMAL, utilizar script correspondiente
try:
product_price = soup.find('span', {'class': 'a-offscreen'}).text.strip()
# Format Correctly
product_price = product_price.replace('.', '')
product_price = product_price.replace(',', '.')
product_price = product_price.replace('€', '')
except:
print('ERROR 1rs TRY')
get_error = 1
except:
product_price = 'ND'
print('ERROR')
get_error = 1
os.system("cls" if os.name == 'nt' else "clear") # clear you screen before returning the output
return product_price, get_error
Make sure to install selenium using pip install selenium
Why does Amazon do it? Most probably because they have their own API for scraping, so they don't want us to do it for free.
Also one more thing, YOu can do this with only selenium too