pythonpython-3.xweb-scrapingbeautifulsoup

Using web scraping to check if an item is in stock


I am creating a Python program that uses web scraping to check if an item is in stock. The code is a Python 3.9 script, using Beautiful Soup 4 and requests to scrape for the item's availability. I would eventually like to make the program search multiple websites and multiple links within each site so I don't have to have a bunch of scripts running at once. The expected result of the program is this:
200
0
In Stock
But I am getting:
200
[]
Out Of Stock

The '200' represents if the code can access the server, 200 is the expected result. The '0' is a boolean to see if the item is in stock, the expected response is either '0' for In Stock. I have given it both in-stock items and out of stock items and they both give the same response of 200 [] Out Of Stock. I have a feeling there is something wrong with the out_of_stock_divs within the def check_item_in_stock because that's where I am getting the [] result of it finding the availability of the item

I had the code working correctly earlier yesterday, and I kept adding features (like it scraping multiple links and different websites) and that broke it, and I can't get it back to a working condition

Here's the program code. (I did base this code off of Mr. Arya Boudaie's code on his website, https://aryaboudaie.com/ I got rid of his text notifications though because I plan on just having this running on a spare computer next to me and have it play a loud sound, that will later be implemented.)

from bs4 import BeautifulSoup
import requests

def get_page_html(url):
    headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36"}
    page = requests.get(url, headers=headers)
    print(page.status_code)
    return page.content


def check_item_in_stock(page_html):
    soup = BeautifulSoup(page_html, 'html.parser')
    out_of_stock_divs = soup.findAll("text", {"class": "product-inventory"})
    print(out_of_stock_divs)
    return len(out_of_stock_divs) != 0

def check_inventory():
    url = "https://www.newegg.com/hp-prodesk-400-g5-nettop-computer/p/N82E16883997492?Item=9SIA7ABC996974"
    page_html = get_page_html(url)
    if check_item_in_stock(page_html):
        print("In stock")
    else:
        print("Out of stock")

while True:
    check_inventory()
    time.sleep(60)```

Solution

  • The product inventory status is located inside a <div> tag, not a <text> tag:

    import requests
    from bs4 import BeautifulSoup
    
    
    def get_page_html(url):
        headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36"}
        page = requests.get(url, headers=headers)
        print(page.status_code)
        return page.content
    
    
    def check_item_in_stock(page_html):
        soup = BeautifulSoup(page_html, 'html.parser')
        out_of_stock_divs = soup.findAll("div", {"class": "product-inventory"})  # <--- change "text" to div
        print(out_of_stock_divs)
        return len(out_of_stock_divs) != 0
    
    def check_inventory():
        url = "https://www.newegg.com/hp-prodesk-400-g5-nettop-computer/p/N82E16883997492?Item=9SIA7ABC996974"
        page_html = get_page_html(url)
        if check_item_in_stock(page_html):
            print("In stock")
        else:
            print("Out of stock")
    
    check_inventory()
    

    Prints:

    200
    [<div class="product-inventory"><strong>In stock.</strong></div>]
    In stock
    

    Note: The HTML markup of that site probably changed in the past, I'd modify the check_item_in_stock function:

    def check_item_in_stock(page_html):
        soup = BeautifulSoup(page_html, 'html.parser')
        out_of_stock_div = soup.find("div", {"class": "product-inventory"})
        return out_of_stock_div.text == "In stock."