pythonhtmlscraper

print the content of the "p" tag after a header in HTML


I'm trying to complete a data scraper assignment. It all works except for this last part in which I need to print the descriptions of cybersecruity vulnerabilities reported to a website based on user search criteria.

for index in range(2): 
    response = requests.get(url_values[index])
    content = response.content
    soup = BeautifulSoup(content,"lxml")
    #find the table content
    for header in soup.find_all("h3", string = "Description"):
        text = find_next.("p")
        print (text)

This is what the HTML looks like in the area I'm trying to get information from:

 ...<section class="content-band">              
        <div class="content">



            <h3>Risk</h3>                           

            <div><p>Low</p></div>






            <h3>Date Discovered</h3>
            <p>February 12, 2019</p>




            <h3>Description</h3>
            <p>Microsoft Windows is prone to a local information-disclosure 
             vulnerability.                                                                        

            Local attackers can exploit this issue to obtain sensitive 
            information that may lead to further attacks.</p>




            <h3>Technologies Affected</h3>...

I want the content (which is in a p element) of the "Description" header (which is an h3 element). I've tried "find_next_sibling" similarly and can't seem to get it working.

Any advice is appreciated.


Solution

  • You can get the text from the h3 sibling element like this:

    print(soup.find("h3", string="Description").find_next_sibling().text)