I've tried with 3 different parsers: lxml
, html5lib
, html.parser
All of them output invalid HTML:
>>> BeautifulSoup('<br>', 'html.parser')
<br/>
>>> BeautifulSoup('<br>', 'lxml')
<html><body><br/></body></html>
>>> BeautifulSoup('<br>', 'html5lib')
<html><head></head><body><br/></body></html>
>>> BeautifulSoup('<br>', 'html.parser').prettify()
'<br/>\n'
All of them have />
"self-closing" void tags.
How can I get BeautifulSoup to output HTML that has void tags without />
?
Use the html5
formatter:
If you pass in
formatter="html5"
, it’s the same asformatter="html"
, but Beautiful Soup will omit the closing slash in HTML void tags like “br”:
from bs4 import BeautifulSoup
BeautifulSoup('<br>', 'html.parser').decode(formatter="html5")
Which outputs:
'<br>'