pythonweb-scrapingbeautifulsoupespn

BeautifulSoup ESPN: Scraping Sports Score but .findAll gives an empty ResultSet. How to pull proper info?


Beginning Python and BeautifulSoup user here.

I'm trying to scrape some sports score from ESPN website but the returns are empty.

Sample Target: ESPN Website > NBA > Scores

I want to get some info such as Team Name, Score, Record, and Quarter/Final but since I'm having trouble I'll just start with Score. I would like to get the total score of the teams.

from bs4 import BeautifulSoup as bs
from urllib.request import urlopen as uReq

html_url = 'http://www.espn.co.uk/nba/scoreboard'

uClient = uReq(html_url)

page_html = uClient.read()

uClient.close()

page_soup = bs(page_html, 'html.parser')

containers = page_soup.findAll('td',{"class":"total"})

print (len(containers))
print (type(containers))

Output

0
<class 'bs4.element.ResultSet'>

I spent the whole day trying to figure out why all my results keep coming back NoneType and empty I can't seem to figure it out.

I tried just looking for 'td' and this is the result

containers = page_soup.findAll('td')

print (len(containers))
print (type(containers))

Output

0
<class 'bs4.element.ResultSet'>

Not sure why I'm unable to pull the data. Is there something going on behind the scenes that ESPN is purposely not allowing us to scrape or something? I have tried looking through different tags, attributes, etc but can't figure it out. Thank you


Solution

  • I believe the problem you're encountering is due to the web content being dynamically displayed through Javascript. The way you're going about it won't let you access that information, but you might want to look at this post on using Selenium and BeautifulSoup together to parse dynamic web content. Try running the code below to get the scores you were searching for there:

    from bs4 import BeautifulSoup
    from selenium import webdriver
    
    driver = webdriver.Firefox()
    driver.get("http://www.espn.co.uk/nba/scoreboard")
    
    html = driver.page_source
    soup = BeautifulSoup(html, "lxml")
    
    for tag in soup.find_all("td", {"class":"total"}):
        print (tag.text)
    

    This produces the following output:

    87
    99
    106
    102
    123
    131
    

    You may need to look at this post to download Selenium and add it to your system PATH in order for the script to work.

    EDIT: Updated to specify the lxml HTML parser recommended by the BeautifulSoup documentation for its speed.