pythonbeautifulsoupsplinter

Beautiful Soup and Splinter - get href and src attributes


This is the code:

url = f'https://www.premierleague.com/players'

# Initiate a splinter instance of the URL
browser.visit(url)

browser.find_by_tag('div[class="table playerIndex"]')
soup = BeautifulSoup(browser.html, 'html.parser')
for el in soup:
    td =  el.findAll('td')
    for each_td in td:
        url = each_td.find('a', href=True)
        print (url)

which hits targeted items, but followed by None:

<a class="playerName" href="/players/19970/Max-Aarons/overview"><img alt="" class="img" data-player="p232980" data-script="pl_player-image" data-size="40x40" data-widget="player-image" src="//platform-static-files.s3.amazonaws.com/premierleague/photos/players/40x40/Photo-Missing.png"/>Max Aarons</a>
None
None
<a class="playerName" href="/players/13279/Abdul-Rahman-Baba/overview"><img alt="" class="img" data-player="p118335" data-script="pl_player-image" data-size="40x40" data-widget="player-image" src="//platform-static-files.s3.amazonaws.com/premierleague/photos/players/40x40/Photo-Missing.png"/>Abdul Rahman Baba</a>
None
None

How do I get href and src values?


Solution

  • You can access an element's attributes and properties as a dictionary.

    for el in soup: 
        td = el.findAll('td') 
        for each_td in td: 
            link = each_td.find('a', href=True)
            if link:
                print(link['href'])
            image = each_td.find('img')
            if image:
                print(image['src'])