pythonbeautifulsoupsplinter

Beautiful Soup not scraping all elements in page


I'm trying to scrape all match odds from this betting page:

enter image description here

This is what console shows me (for the first and second rows):

enter image description here

I need to scrape:

  1. the match name ie 'Palmeiras - Coritiba' etc

  2. column '1', column 'X' and column '2' values for each row.

So far I have this code:

from splinter import Browser
from bs4 import BeautifulSoup

executable_path = {"executable_path": "/path/to/geckodriver"}
browser = Browser("firefox", **executable_path, headless=True, incognito=True)

bets = f'https://www.oddsportal.com/soccer/brazil/serie-a/'
browser.visit(bets)
# parse html
soup = BeautifulSoup(browser.html, 'html.parser')

odds = soup.find_all('tr', class_="odd")
for el in odds:
    print (el.find('a').contents[0])
    print (el.find('td', class_='odds-nowrp'))

but I'm getting only column '1' values of 6 out of 9 rows:

<td class="odds-nowrp" xodd="1.52" xoid="E-3pdmnxv464x0x9vma2"><a href="" onclick="globals.ch.togle(this , 'E-3pdmnxv464x0x9vma2');return false;" xparam="odds_text">1.52</a></td>
 
<td class="odds-nowrp" xodd="1.9" xoid="E-3pdmoxv464x0x9vma4"><a href="" onclick="globals.ch.togle(this , 'E-3pdmoxv464x0x9vma4');return false;" xparam="odds_text">1.90</a></td>
 
<td class="odds-nowrp" xodd="2.17" xoid="E-3pdmsxv464x0x9vmac"><a href="" onclick="globals.ch.togle(this , 'E-3pdmsxv464x0x9vmac');return false;" xparam="odds_text">2.17</a></td>
 
<td class="odds-nowrp" xodd="4.24" xoid="E-3pdmrxv464x0x9vmaa"><a href="" onclick="globals.ch.togle(this , 'E-3pdmrxv464x0x9vmaa');return false;" xparam="odds_text">4.24</a></td>
 
<td class="odds-nowrp" xodd="2.49" xoid="E-3pdmuxv464x0x9vmag"><a href="" onclick="globals.ch.togle(this , 'E-3pdmuxv464x0x9vmag');return false;" xparam="odds_text">2.49</a></td>
 
<td class="odds-nowrp" xodd="4.08" xoid="E-3pdn3xv464x0x9vmaq"><a href="" onclick="globals.ch.togle(this , 'E-3pdn3xv464x0x9vmaq');return false;" xparam="odds_text">4.08</a></td>

and I'm not getting 'a' text, as desired.

How do I fetch all column values and matches for text on this page?


Solution

  • You are not getting all the rows because of your selector. You are using the css class odd, which is only applied for odd rows (the ones with the white background).

    As for not getting the text of the <a> tag, this is caused by the fact that there are two <a> tags in the first column of each row and you are reading the content of the first one, which does not contain any text.

    You can try this approach - instead of looking for rows, look for the first column (the <td> which contains the match name) and then looking at its siblings <td> using findNext.

    Example for printing the match name and the value of the column '1':

    odds = soup.find_all('td', class_="name table-participant")
    for el in odds:
        print (el.find('a').findNext('a').contents[0])
        print (el.findNext('td').find('a').contents[0])
    

    EDIT: This example prints match name and all odds (it also accounts for the change in html structure when the table contains finished matches):

    odds = soup.find_all('td', class_="name table-participant")
    for el in odds:
        links_in_first_column = el.find_all('a')
        match_name = ''.join(map(lambda e : e.text.strip(), links_in_first_column))
        print(match_name)
    
        odds_columns = el.find_next_siblings('td', xodd=True)
        print (odds_columns[0]['xodd'])
        print (odds_columns[1]['xodd'])
        print (odds_columns[2]['xodd'])