I am new to coding and need some assistance. I am trying to make a web scraper for a project that involves scraping NFL roster data from 2000 to 2023 but am getting an error requesting the html. I am using Jupyter labs (Python-Pyodide) to write my code and this is the only code I have:
import requests
from bs4 import BeautifulSoup
import pandas as pd
from io import StringIO
years = list(range(2000, 2024))
url = 'https://www.footballdb.com/teams/nfl/arizona-cardinals/roster/2023'
data = requests.get(url)
This is the error I'm getting:
(JsException: NetworkError: Failed to execute 'send' on 'XMLHttpRequest': Failed to load 'https://www.footballdb.com/teams/nfl/arizona-cardinals/roster/2023'.)
Can you explain why I am getting this error and how do i fix it?
You didn't specify the request headers. But this page doesnt have table tags, so u cant use pd.read_html
import requests
from bs4 import BeautifulSoup
import pandas as pd
url = "https://www.footballdb.com/teams/nfl/arizona-cardinals/roster/2023"
headers = {
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36'
}
result = []
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'lxml')
table = soup.find('div', class_='divtable divtable-striped divtable-mobile')
table_head = [head.get_text() for head in table.find('div', class_='thead')]
for s in table.find_all('span', class_='visible-xs-inline'):
s.extract()
for row in table.find_all('div', class_='tr'):
result.append(dict(zip(table_head, [cell.get_text() for cell in row.find_all('div', class_='td')])))
df = pd.DataFrame(result)
print(df)
OUTPUT:
# Player Pos G GS Age College
0 82 Andre Baccellia WR 5 0 26 Washington
1 3 Budda Baker DB 12 12 27 Washington
2 96 Eric Banks DE 2 0 25 Texas-San Antonio
3 51 Krys Barnes LB 16 6 25 UCLA
4 66 Jackson Barton OT 1 0 28 Utah
.. .. ... .. .. .. .. ...
73 21 Garrett Williams DB 9 6 22 Syracuse
74 27 Divaad Wilson DB 2 1 23 Central Florida
75 20 Marco Wilson DB 15 11 24 Florida
76 14 Michael Wilson WR 13 12 23 Stanford
77 10 Josh Woods LB 11 7 27 Maryland