[SOLVED] BeautifulSoup 4 - Web scraping soccer matches for 'today'

BeautifulSoup 4 - Web scraping soccer matches for 'today'

I'm very new to python and trying to webscrape soccer matches for 'today' from the fox sports website: https://www.foxsports.com/scores/soccer. Unfortunately, I keep running into issues with

'AttributeError: 'NoneType' object has no attribute 'find_all''

and can't seem to get the teams for that day. This is what I have so far:

import bs4 
import requests 

res = requests.get('foxsports.com/scores/soccer') 
soup = bs4.BeautifulSoup(res.text, 'html.parser') 
results = soup.find("div", class_="scores-date") 
games = results.find("div", class_="scores") 

print(games)

Solution

What happens?

Content is not static it is served dynamically by website, so request won't get the information you can see in your dev tools.

How to fix?

Use an api provided or selenium that handels content like a browser and can provide the page_source you are looking for.

Cause not all of the content is provided directly, you have to use selenium waits to locate the presence of the <span> with class "title-text".

Example

Note Example uses selenium 4, so check your version, update or adapt requiered dependencies to a lower version by yourself

from bs4 import BeautifulSoup 
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service as ChromeService
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

service = ChromeService(executable_path='ENTER YOUR PATH TO CHROMEDRIVER')
driver = webdriver.Chrome(service=service)
driver.get('https://www.foxsports.com/scores/soccer')

WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, '//span[contains(@class, "title-text") and text() = "Today"]')))

soup = BeautifulSoup(driver.page_source, 'lxml')

for g in soup.select('.scores-date:not(:has(div)) + div .score-chip-content'):
    print(list(g.stripped_strings))

Output

['SERIE A', 'JUVENTUS', '9-4-5', 'JUV', '9-4-5', 'CAGLIARI', '1-7-10', 'CAG', '1-7-10', '8:45PM', 'Paramount+', 'JUV -455', 'CAG +1100']
['LG CUP', 'ARSENAL', '0-0-0', 'ARS', '0-0-0', 'SUNDERLAND', '0-0-0', 'SUN', '0-0-0', '8:45PM', 'ARS -454', 'SUN +1243']
['LA LIGA', 'SEVILLA', '11-4-2', 'SEV', '11-4-2', 'BARCELONA', '7-6-4', 'BAR', '7-6-4', '9:30PM', 'ESPN+', 'SEV +155', 'BAR +180']