pythonhtmlbeautifulsoupnon-well-formed

Parse html files in a directory and check if they are badly formed in Python


I am hoping to write a script that will go through a directory and check if the html files are badly formed. Please see my code

directory = "html"
for root, dirs, files in os.walk(directory):
    for file in files:
        if str(file).endswith('.html'):
              #Help needed here
              if file is badly formed:
                 print "Badly Formed"
              else:
                 print "Well Formed"

Solution

  • import xml.etree.ElementTree as ETree
    ....
    
        try:
            self.doc = ETree.parse( file )
            # do stuff with it ...
        except  ETree.ParseError :
            print( "ERROR in {0} : {1}".format( ETree.ParseError.filename, ETree.ParseError.msg ) )