I am having trouble scraping ESPN Gamecast links from the espn scoreboard webpage. I have tried:
site = "https://www.espn.com/mlb/scoreboard"
html = requests.get(site).text
soup = BeautifulSoup(html, 'html.parser').find_all('a')
links = [link.get('href') for link in soup]
but the links are not being recognized.
It's loaded dynamically so you need to either a) use somethinging like Selenium that allows the page to render before parsing with bs4, or b) go straight to the data source/api. Api is often the best option:
import requests
api = 'http://site.api.espn.com/apis/site/v2/sports/baseball/mlb/scoreboard'
jsonData = requests.get(api).json()
events = jsonData['events']
links = []
for event in events:
event_links = event['links']
for each in event_links:
if each['text'] == 'Gamecast':
links.append(each['href'])
Ouput:
print(links)
['http://www.espn.com/mlb/game/_/gameId/401228229', 'http://www.espn.com/mlb/game/_/gameId/401228235', 'http://www.espn.com/mlb/game/_/gameId/401228242', 'http://www.espn.com/mlb/game/_/gameId/401228240', 'http://www.espn.com/mlb/game/_/gameId/401228233', 'http://www.espn.com/mlb/game/_/gameId/401228234', 'http://www.espn.com/mlb/game/_/gameId/401228239', 'http://www.espn.com/mlb/game/_/gameId/401228237', 'http://www.espn.com/mlb/game/_/gameId/401228231', 'http://www.espn.com/mlb/game/_/gameId/401228232', 'http://www.espn.com/mlb/game/_/gameId/401228236', 'http://www.espn.com/mlb/game/_/gameId/401228230', 'http://www.espn.com/mlb/game/_/gameId/401228238', 'http://www.espn.com/mlb/game/_/gameId/401228243', 'http://www.espn.com/mlb/game/_/gameId/401228241']