I want to extract a block of text within the div tag. I've seen several posts discussing various div attributes, but the tag I want has no attributes - it's just < div>.
Below is an excerpt of the html. There are dozens of div tags above and below it, but this is the only one that is just < div>.
<div>
<!-- Some text. -->
<i>
[Text I want block 1]
</i>
text I want 1
<br/>
text I want 2
<br/>
text I want 3
<br/>
<br/>
</div>
However, any find method with "div" returns too much. I tried the following:
1) String and tag searches pickup every tag containing div
soup.find("div")
soup.div
3) Isolating the parent, then div searching within still returns too much.
divParent = soup.find("div", class_="col-xs-12 col-lg-8 text-center")
divParent.find("div")
Any ideas? Div seems to be too common of a tag/string to isolate.
This can be one way of doing the job:
from bs4 import BeautifulSoup
content='''
<div>
<!-- Some text. -->
<i>
[Text I want block 1]
</i>
text I want 1
<br/>
text I want 2
<br/>
text I want 3
<br/>
<br/>
</div>
'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(content,"lxml")
data = ''.join([item.parent.text.strip() for item in soup.select('div i')])
print(data)