I am using lxml.html to generate some HTML. I want to pretty print (with indentation) my final result into an html file. How do I do that?
This is what I have tried and got till now
import lxml.html as lh
from lxml.html import builder as E
sliderRoot=lh.Element("div", E.CLASS("scroll"), style="overflow-x: hidden; overflow-y: hidden;")
scrollContainer=lh.Element("div", E.CLASS("scrollContainer"), style="width: 4340px;")
sliderRoot.append(scrollContainer)
print lh.tostring(sliderRoot, pretty_print = True, method="html")
As you can see I am using the pretty_print=True
attribute. I thought that would give indented code, but it doesn't really help. This is the output :
<div style="overflow-x: hidden; overflow-y: hidden;" class="scroll"><div style="width: 4340px;" class="scrollContainer"></div></div>
I ended up using BeautifulSoup directly. That is something lxml.html.soupparser uses for parsing HTML.
BeautifulSoup has a prettify method that does exactly what it says it does. It prettifies the HTML with proper indents and everything.
BeautifulSoup will NOT fix the HTML, so broken code, remains broken. But in this case, since the code is being generated by lxml, the HTML code should be at least semantically correct.
In the example given in my question, I will have to do this :
from bs4 import BeautifulSoup as bs
root = lh.tostring(sliderRoot) #convert the generated HTML to a string
soup = bs(root) #make BeautifulSoup
prettyHTML = soup.prettify() #prettify the html