pythonhtmlcssbeautifulsouppython-requests

BeautifulSoup 4 - Web scraping soccer matches for 'today'


I'm very new to python and trying to webscrape soccer matches for 'today' from the fox sports website: https://www.foxsports.com/scores/soccer. Unfortunately, I keep running into issues with

'AttributeError: 'NoneType' object has no attribute 'find_all''

and can't seem to get the teams for that day. This is what I have so far:

import bs4 
import requests 

res = requests.get('foxsports.com/scores/soccer') 
soup = bs4.BeautifulSoup(res.text, 'html.parser') 
results = soup.find("div", class_="scores-date") 
games = results.find("div", class_="scores") 

print(games) 

Solution

  • What happens?

    Content is not static it is served dynamically by website, so request won't get the information you can see in your dev tools.

    How to fix?

    Use an api provided or selenium that handels content like a browser and can provide the page_source you are looking for.

    Cause not all of the content is provided directly, you have to use selenium waits to locate the presence of the <span> with class "title-text".

    Example

    Note Example uses selenium 4, so check your version, update or adapt requiered dependencies to a lower version by yourself

    from bs4 import BeautifulSoup 
    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.chrome.service import Service as ChromeService
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    
    service = ChromeService(executable_path='ENTER YOUR PATH TO CHROMEDRIVER')
    driver = webdriver.Chrome(service=service)
    driver.get('https://www.foxsports.com/scores/soccer')
    
    WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, '//span[contains(@class, "title-text") and text() = "Today"]')))
    
    soup = BeautifulSoup(driver.page_source, 'lxml')
    
    for g in soup.select('.scores-date:not(:has(div)) + div .score-chip-content'):
        print(list(g.stripped_strings))
    

    Output

    ['SERIE A', 'JUVENTUS', '9-4-5', 'JUV', '9-4-5', 'CAGLIARI', '1-7-10', 'CAG', '1-7-10', '8:45PM', 'Paramount+', 'JUV -455', 'CAG +1100']
    ['LG CUP', 'ARSENAL', '0-0-0', 'ARS', '0-0-0', 'SUNDERLAND', '0-0-0', 'SUN', '0-0-0', '8:45PM', 'ARS -454', 'SUN +1243']
    ['LA LIGA', 'SEVILLA', '11-4-2', 'SEV', '11-4-2', 'BARCELONA', '7-6-4', 'BAR', '7-6-4', '9:30PM', 'ESPN+', 'SEV +155', 'BAR +180']