New to web scraping and coding in general. This is probably an easy problem for someone more experienced... maybe not... here it is:
Trying to web scrape a table from wikipedia. I've located the table in the html and added that info in my code. However when I run it I get 'none' returned instead of confirmation the table has been correctly located.
from bs4 import BeautifulSoup
from urllib.request import urlopen
url = 'https://en.wikipedia.org/wiki/List_of_songs_recorded_by_the_Beatles'
html = urlopen(url)
soup = BeautifulSoup(html, 'html.parser')
table = soup.find('table',{'class':'wikitable sortable plainrowheaders jquery-tablesorter'})
print(table)
Return: None
Remove the jquery-tablesorter
from the "class" string - this class is added by javascript and beautifulsoup doesn't see it (note: always observe the real HTML document the servers sends you, that's what is beautifulsoup seeing - press ctrl-U
in your browser):
from urllib.request import urlopen
from bs4 import BeautifulSoup
url = "https://en.wikipedia.org/wiki/List_of_songs_recorded_by_the_Beatles"
html = urlopen(url)
soup = BeautifulSoup(html, "html.parser")
table = soup.find("table", {"class": "wikitable sortable plainrowheaders"})
print(table)
Prints:
<table class="wikitable sortable plainrowheaders" style="text-align:center">
<caption>Name of song, core catalogue release, songwriter, lead vocalist and year of original release
</caption>
<tbody><tr>
<th scope="col">Song
</th>
<th scope="col">Core catalogue release(s)
</th>
<th scope="col">Songwriter(s)
</th>
...