pythonweb-scrapingbeautifulsoupmechanize

BeautifulSoup: Finding a specific URL in html and printing


Ok, so I have this html-page (full of different urls), where I want to grab a single url and print it.

The webpage is: https://bdkv2.borger.dk/foa/Sider/default.aspx?fk=22&foaid=11523251

I want to print the url www.albertslund.dk

It looks like this in the source code:

<a href="http://www.albertslund.dk" id="_uscAncHomesite" target="_blank"><strong><span id="ctl00_PlaceHolderMain_FormControlHandler1__uscShowDataAuthorityDetails__uscLblHomesite">http://www.albertslund.dk</span></strong></a>

When I try to grab it and print it by using it's ID (using BeautifulSoup and Mechanize), it just returns an empty list. I would like to grab the URL using the ID, because I'm scraping a bunch of similar sites, where the things that I want have the same ID.

kommuneside = br.open(https://bdkv2.borger.dk/foa/Sider/default.aspx?fk=22&foaid=11523251)
html2 = kommuneside.read()
soup2 = BeautifulSoup(html2)
hjemmesidelink = soup2.findAll('a', attras={'ID':'_uscAncHomesite'})
print hjemmesidelink

This returns just an empty list: []

If I try like this:

print hjemmesidelink['href']

I get: TypeError: list indices must be integers, not str

I would've thought, that it was pretty straightforward, but I'm a rookie, and it has bugged me for days now.


Solution

  • There are a number of typos in your code, so I can't say for sure why your search doesn't match anything, but the most likely problem is that you're searching for the attribute "ID" (uppercase), but the attribute in the markup is "id" (lowercase).

    Since you only want to find one tag, I recommend you use find(), which will return the tag on its own, instead of a list containing the tag. This is how I would write the code:

    print soup.find('a', id='_uscAncHomesite')                                      
    # <a href="0" id="_uscAncHomesite" target="_blank">...</a>
    

    Incidentally, your use of findAll makes me think you're using Beautiful Soup 3. I recommend Beautiful Soup 4 for all new projects.