pythonhtmlbeautifulsoupfilter

How to find all <a href> with a specific anchor text using BeautifulSoup


I am trying to use beautiful soup to parse html and find all href with a specific anchor tag

<a href="http://example.com">TEXT</a>
<a href="http://example.com/link">TEXT</a>
<a href="http://example.com/page">TEXT</a>

all the links I am looking for have the exact same anchor text, in this case TEXT. I am NOT looking for the word TEXT, I want to use the word TEXT to find all the different HREF.

For clarification looking for something similar to using the class to parse for the links

<a href="http://example.com" class="visible">TEXT</a>
<a href="http://example.com/link" class="visible">TEXT</a>
<a href="http://example.com/page" class="visible">TEXT</a>

and then using

findAll('a', 'visible')

except the HTML I am parsing doesn't have a class but always the same anchor text.


Solution

  • Would something like this work?

    In [39]: from bs4 import BeautifulSoup
    
    In [40]: s = """\
       ....: <a href="http://example.com">TEXT</a>
       ....: <a href="http://example.com/link">TEXT</a>
       ....: <a href="http://example.com/page">TEXT</a>
       ....: <a href="http://dontmatchme.com/page">WRONGTEXT</a>"""
    
    In [41]: soup = BeautifulSoup(s)
    
    In [42]: for link in soup.findAll('a', href=True, text='TEXT'):
       ....:     print link['href']
       ....:
       ....:
    http://example.com
    http://example.com/link
    http://example.com/page