pythonwindowsbeautifulsoupfind

Beautiful Soup ".find" not working running from windows terminal


i'm trying to automate a program to scrap periodically some prices from amazon and other pages. (I'm starting with amazon)

The problem is when i do the soup.find method with PyCharm, it finds his target and returns-it correctly and with the windows terminal it returns: None

I have the code running well from PyCharm, but i need it running from the windows terminal to automate it thought a .bat file.

I find that's a very strange issue and I could't find documentation about it so if any of you could help me with it It would be awesome!

There's some things I've tried so they are discarded.

I've compared the soup it gets with PyCharm and Windows and are different soups, in the Windows one i couldn't find manually the text words.

Finally I'm putting here the code I'm using so you can see what i'm seeing:

import time
import requests
from bs4 import BeautifulSoup
import pandas as pd
import os
from csv import writer
from datetime import date, datetime
from tqdm import tqdm

def r_Amazon(URL):
    headers = {
        'Accept-Encoding': 'gzip, deflate, br',
        'Accept-Language': 'es-ES,es;q=0.8',
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36'
    }
    response = requests.get(URL, headers=headers)
    soup = BeautifulSoup(response.content, 'lxml')
    # Comprobar si el Item esta en Stock o Solo Segunda Mano
    get_error = 0
    try:
        product_price_stat = soup.find('span', {'class': 'a-text-bold'}).text.strip() #<- HERE IT FAILS

        if product_price_stat == 'Comprar de segunda mano' or product_price_stat == 'Ofertas destacadas no disponibles':
            # El item tiene el precio de 2a MANO, utilizar script correspondiente
            try:
                product_price = 'ND'
                get_error = 1
            except:
                print('ERROR 2nd TRY')
                get_error = 1
        else:
            # El item tiene el precio NORMAL, utilizar script correspondiente
            try:
                product_price = soup.find('span', {'class': 'a-offscreen'}).text.strip()
                # Format Correctly
                product_price = product_price.replace('.', '')
                product_price = product_price.replace(',', '.')
                product_price = product_price.replace('€', '')
            except:
                print('ERROR 1rs TRY')
                get_error = 1
    except:
        product_price = 'ND'
        print('ERROR')
        get_error = 1
    return product_price, get_error

Solution

  • It is neither the problem with code or terminal, it is just that amazon is not letting you do scarping because it think that you are a robot(YES even if you use Header most of the time amazom can detect it).

    If you try to print the soup in the function (at the time of error you will g et this)

    Enter the characters you see below

    Sorry, we just need to make sure you're not a robot. For best results, please make sure your browser is accepting cookies.

    I find my self in this same mess in past, I recommend you to use Selenium to get the content of webpage instead of request.

    This is how you can do it

    import time
    # import requests
    from selenium import webdriver
    from selenium.webdriver.chrome.service import Service
    from selenium.webdriver.chrome.options import Options
    from bs4 import BeautifulSoup
    import pandas as pd
    import os
    from csv import writer
    from datetime import date, datetime
    from tqdm import tqdm
    
    def r_Amazon(URL):
        chrome_options = Options()
        chrome_options.add_argument('--headless') # using this so that the browser will open in background
        driver = webdriver.Chrome(options=chrome_options)
        driver.get(URL)
        soup = BeautifulSoup(driver.page_source, 'lxml')
        driver.quit()
        # Comprobar si el Item esta en Stock o Solo Segunda Mano
        get_error = 0
        try:
            product_price_stat = soup.find('span', {'class': 'a-text-bold'}).text.strip() #<- HERE IT FAILS
    
            if product_price_stat == 'Comprar de segunda mano' or product_price_stat == 'Ofertas destacadas no disponibles':
                # El item tiene el precio de 2a MANO, utilizar script correspondiente
                try:
                    product_price = 'ND'
                    get_error = 1
                except:
                    print('ERROR 2nd TRY')
                    get_error = 1
            else:
                # El item tiene el precio NORMAL, utilizar script correspondiente
                try:
                    
                    product_price = soup.find('span', {'class': 'a-offscreen'}).text.strip()
                    # Format Correctly
                    product_price = product_price.replace('.', '')
                    product_price = product_price.replace(',', '.')
                    product_price = product_price.replace('€', '')
                except:
                    print('ERROR 1rs TRY')
                    get_error = 1
        except:
            product_price = 'ND'
            print('ERROR')
            get_error = 1
        os.system("cls" if os.name == 'nt' else "clear") # clear you screen before returning the output
        return product_price, get_error
    
    
    

    Make sure to install selenium using pip install selenium


    Why does Amazon do it? Most probably because they have their own API for scraping, so they don't want us to do it for free.

    Also one more thing, YOu can do this with only selenium too