I'm trying to scrape all match odds from this betting page:
This is what console shows me (for the first and second rows):
I need to scrape:
the match name ie 'Palmeiras - Coritiba' etc
column '1', column 'X' and column '2' values for each row.
So far I have this code:
from splinter import Browser
from bs4 import BeautifulSoup
executable_path = {"executable_path": "/path/to/geckodriver"}
browser = Browser("firefox", **executable_path, headless=True, incognito=True)
bets = f'https://www.oddsportal.com/soccer/brazil/serie-a/'
browser.visit(bets)
# parse html
soup = BeautifulSoup(browser.html, 'html.parser')
odds = soup.find_all('tr', class_="odd")
for el in odds:
print (el.find('a').contents[0])
print (el.find('td', class_='odds-nowrp'))
but I'm getting only column '1' values of 6 out of 9 rows:
<td class="odds-nowrp" xodd="1.52" xoid="E-3pdmnxv464x0x9vma2"><a href="" onclick="globals.ch.togle(this , 'E-3pdmnxv464x0x9vma2');return false;" xparam="odds_text">1.52</a></td>
<td class="odds-nowrp" xodd="1.9" xoid="E-3pdmoxv464x0x9vma4"><a href="" onclick="globals.ch.togle(this , 'E-3pdmoxv464x0x9vma4');return false;" xparam="odds_text">1.90</a></td>
<td class="odds-nowrp" xodd="2.17" xoid="E-3pdmsxv464x0x9vmac"><a href="" onclick="globals.ch.togle(this , 'E-3pdmsxv464x0x9vmac');return false;" xparam="odds_text">2.17</a></td>
<td class="odds-nowrp" xodd="4.24" xoid="E-3pdmrxv464x0x9vmaa"><a href="" onclick="globals.ch.togle(this , 'E-3pdmrxv464x0x9vmaa');return false;" xparam="odds_text">4.24</a></td>
<td class="odds-nowrp" xodd="2.49" xoid="E-3pdmuxv464x0x9vmag"><a href="" onclick="globals.ch.togle(this , 'E-3pdmuxv464x0x9vmag');return false;" xparam="odds_text">2.49</a></td>
<td class="odds-nowrp" xodd="4.08" xoid="E-3pdn3xv464x0x9vmaq"><a href="" onclick="globals.ch.togle(this , 'E-3pdn3xv464x0x9vmaq');return false;" xparam="odds_text">4.08</a></td>
and I'm not getting 'a' text, as desired.
How do I fetch all column values and matches for text on this page?
You are not getting all the rows because of your selector. You are using the css class odd
, which is only applied for odd rows (the ones with the white background).
As for not getting the text of the <a>
tag, this is caused by the fact that there are two <a>
tags in the first column of each row and you are reading the content of the first one, which does not contain any text.
You can try this approach - instead of looking for rows, look for the first column (the <td>
which contains the match name) and then looking at its siblings <td>
using findNext
.
Example for printing the match name and the value of the column '1':
odds = soup.find_all('td', class_="name table-participant")
for el in odds:
print (el.find('a').findNext('a').contents[0])
print (el.findNext('td').find('a').contents[0])
EDIT: This example prints match name and all odds (it also accounts for the change in html structure when the table contains finished matches):
odds = soup.find_all('td', class_="name table-participant")
for el in odds:
links_in_first_column = el.find_all('a')
match_name = ''.join(map(lambda e : e.text.strip(), links_in_first_column))
print(match_name)
odds_columns = el.find_next_siblings('td', xodd=True)
print (odds_columns[0]['xodd'])
print (odds_columns[1]['xodd'])
print (odds_columns[2]['xodd'])