pythonbeautifulsoupcss-selectorshtml-parsing

why is my html parser not outputting wanted number


my programming teacher made us program in python a calculator for calculating fuel consummation in L/100KM and i decided to go further and even have it calculate the price per 100km but heres the thing, i tried to make an html parser using beautifulsoup4 (bs4) so it finds the gas price for me and updates it if ever it changes on the website and i found the css selector for the number but im unsure if its wrong or what is wrong in the parser because when i run it, it returns "initial number: none" instead of the number specified by the css selector. heres the code for my parser:

import requests
from bs4 import BeautifulSoup
import time

# URL of the website to monitor
url = 'https://nbeub.ca/index.php?page=current-petroleum-prices-2'

# Function to fetch the number from the website
def fetch_number():
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')
    
    # Adjust the selector to find the specific number
    number = soup.select_one('body > table > tbody > tr:nth-child(5) > td > table > tbody > tr > td > table > tbody > tr > td:nth-child(3) > table > tbody > tr:nth-child(3) > td:nth-child(2)')
    return str(number)

# Main monitoring function
def monitor():
    last_number = fetch_number()
    print(f"Initial number: {last_number}")

    while True:
        time.sleep(2592000)  # Wait for 30 days before checking again
        current_number = fetch_number()
        
        if current_number != last_number:
            print(f"Number updated: {current_number}")
            last_number = current_number

# Start monitoring
monitor()

my terminal is being normal asking me for inputs but as it gets to parser it just prints initial number: none and then doesnt do the rest of my code


Solution

  • Most of the tables there don't have <tbody> elements, the <tr> is nested directly inside <table>. Instead of using all those direct child selectors, use descendant selectors without > tbody > so it will match either way.

    number = soup.select_one('body > table tr:nth-child(5) > td > table tr > td > table tr > td:nth-child(3) > table tr:nth-child(3) > td:nth-child(2)')