I have this code:
evil = "<script>malignus script</script><b>bold text</b><i>italic text</i>"
cleaner = Cleaner(remove_unknown_tags=False, allow_tags=['p', 'br', 'b'],
page_structure=True)
print cleaner.clean_html(evil)
I expected to get this:
<b>bold text</b>italic text
But instead I'm getting this:
<div><b>bold text</b>italic text</div>
Is there an attribute to remove the div
tag wrapper?
lxml expects your html to have a tree structure, ie a single root node. If it does not have one, it adds it.