pythonweb-scrapingbeautifulsoupwikipedia

Web Scraping Wikipedia Table Using Beautiful Soup Getting 'None' Returned


New to web scraping and coding in general. This is probably an easy problem for someone more experienced... maybe not... here it is:

Trying to web scrape a table from wikipedia. I've located the table in the html and added that info in my code. However when I run it I get 'none' returned instead of confirmation the table has been correctly located.

from bs4 import BeautifulSoup
from urllib.request import urlopen


url = 'https://en.wikipedia.org/wiki/List_of_songs_recorded_by_the_Beatles'
html = urlopen(url) 
soup = BeautifulSoup(html, 'html.parser')            

table = soup.find('table',{'class':'wikitable sortable plainrowheaders jquery-tablesorter'})
print(table)

Return: None


Solution

  • Remove the jquery-tablesorter from the "class" string - this class is added by javascript and beautifulsoup doesn't see it (note: always observe the real HTML document the servers sends you, that's what is beautifulsoup seeing - press ctrl-U in your browser):

    from urllib.request import urlopen
    
    from bs4 import BeautifulSoup
    
    url = "https://en.wikipedia.org/wiki/List_of_songs_recorded_by_the_Beatles"
    html = urlopen(url)
    soup = BeautifulSoup(html, "html.parser")
    
    table = soup.find("table", {"class": "wikitable sortable plainrowheaders"})
    print(table)
    

    Prints:

    <table class="wikitable sortable plainrowheaders" style="text-align:center">
    <caption>Name of song, core catalogue release, songwriter, lead vocalist and year of original release
    </caption>
    <tbody><tr>
    <th scope="col">Song
    </th>
    <th scope="col">Core catalogue release(s)
    </th>
    <th scope="col">Songwriter(s)
    </th>
    
    ...