pythonhtml-parser

python class variable reset?


I am having this issue now, so I have a HTMLParser using HTMLParser library class like this

class MyHTMLParser(HTMLParser):
    temp = ''
    def handle_data(self, data):
        MyHTMLParser.temp += data

I need the temp variable because I need to save the data somewhere else so I can assess somewhere else.

My code use the class looks like this:

for val in enumerate(mylist):
    parser = HTMLParser()
    parser.feed(someHTMLHere)
    string = parser.temp.strip().split('\n')

The problem with is that this temp variable is storing whatever I stored it before, it doesn't reset even tho I am declaring a new instance of the parser every single time. How do I clear this variable??? I don't want it to save whatever's there from the previous loop


Solution

  • Like others have stated, the problem is that you are adding the data to the class variable instead of the instance variable. This is happening because of the line MyHTMLParser.temp += data

    If you change it to self.temp += data it will change the data of each instance rather than storing it up in the class.

    Here is a full working script:

    from html.parser import HTMLParser
    
    class MyHTMLParser(HTMLParser):
        temp = ""
    
        """Personally, I would go this route"""
        #def __init__(self):
        #   self.temp = ""
        #   super().__init__()
        """Don't forget the super() or it will break"""
    
        def handle_data(self, data):
            self.temp += data # <---Only real line change
    
    """TEST VARIABLES"""
    someHTMLHere = '<html><head><title>Test</title></head>\
    <body><h1>Parse me!</h1></body></html>'
    mylist = range(5)
    """"""""""""""""""
    
    for val in enumerate(mylist):
        parser = MyHTMLParser() #Corrected typo from HTML to MyHTML
        parser.feed(someHTMLHere)
        string = parser.temp.strip().split('\n')
    
        print(string) #To Test each iteration