pythonbeautifulsoupdomain-data-modelling

Web Scraping Fbref table


My code so far, works for different table on FBref website, however struggling to get player details. The below code:

import requests
from bs4 import BeautifulSoup, Comment


url = 'https://fbref.com/en/squads/18bb7c10/Arsenal-Stats'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

table = BeautifulSoup(soup.select_one('#stats_standard').find_next(text=lambda x: isinstance(x, Comment)), 'html.parser')

#print some information from the table to screen:
for tr in table.select('tr:has(td)'):
tds = [td.get_text(strip=True) for td in tr.select('td')]
print('{:<30}{:<20}{:<10}'.format(tds[0], tds[3], tds[5]))

gives me the error

AttributeError: 'NoneType' object has no attribute 'find_next'

Solution

  • What happens?

    As mentioned, there is no table with id stats_standard the id should be stats_standard_10728

    How to fix and go a bit generic

    Change your table selector to:

    table = soup.select_one('table[id^="stats_standard"]')
    

    Example

    import requests
    from bs4 import BeautifulSoup, Comment
    
    
    url = 'https://fbref.com/en/squads/18bb7c10/Arsenal-Stats'
    soup = BeautifulSoup(requests.get(url).content, 'html.parser')
    
    table = soup.select_one('table[id^="stats_standard"]')
    
    #print some information from the table to screen:
    for tr in table.select('tr:has(td)'):
        tds = [td.get_text(strip=True) for td in tr.select('td')]
        print('{:<30}{:<20}{:<10}'.format(tds[0], tds[3], tds[5]))
    

    Just in case

    You can make your life much easier using pandas read_html() to grab, display and modify table data.

    Example

    import pandas as pd
    pd.read_html('https://fbref.com/en/squads/18bb7c10/Arsenal-Stats')[0]
    

    Need to know, how to handle commented tables? Check -> How to scrape table from fbref could not be found by BeautifulSoup?