pythonbeautifulsoup

How to get a nested element in Beautiful Soup


I am struggling with the syntax required to grab some hrefs in a <td>. The <table>, <tr> and <td> elements don't have any classes or ids.

If I wanted to grab the anchor in this example, what would I need?

<tr>
    <td>
        <a>...</a>
    </td>
</tr>

Solution

  • As per the docs, you first make a parse tree:

    import BeautifulSoup
    html = "<html><body><tr><td><a href='foo'/></td></tr></body></html>"
    soup = BeautifulSoup.BeautifulSoup(html)
    

    and then you search in it, for example for <a> tags whose immediate parent is a <td>:

    for ana in soup.findAll('a'):
      if ana.parent.name == 'td':
        print ana["href"]