I am using BeautifulSoup to parse some content from a html page.
I can extract from the html the content I want (i.e. the text contained in a span
defined by the class
myclass).
result = mycontent.find(attrs={'class':'myclass'})
I obtain this result:
<span class="myclass">Lorem ipsum<br/>dolor sit amet,<br/>consectetur...</span>
If I try to extract the text using:
result.get_text()
I obtain:
Lorem ipsumdolor sit amet,consectetur...
As you can see when the tag <br>
is removed there is no more spacing between the contents and two words are concated.
How can I solve this issue?
If you are using bs4 you can use strings
:
" ".join(result.strings)