pythonhtmlgoogle-app-enginehtml-parsingattributeerror

How to fix this AttributeError?


I installed a stripe package yesterday and now my app is not running. I am trying to understand where the problem is. Is it something to do with PyShell or HTLParser or something else. I am posting with GAE tag as well hoping that the trace from logs may give a clue about the problem:

MLStripper instance has no attribute 'rawdata'
Traceback (most recent call last):
  File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/webapp/_webapp25.py", line 703, in __call__
    handler.post(*groups)
  File "/base/data/home/apps/ting-1/1.354723388329082800/ting.py", line 2070, in post
    pitch_no_tags = strip_tags(pitch_original)
  File "/base/data/home/apps/ting-1/1.354723388329082800/ting.py", line 128, in strip_tags
    s.feed(html)
  File "/base/python_runtime/python_dist/lib/python2.5/HTMLParser.py", line 107, in feed
    self.rawdata = self.rawdata + data
AttributeError: MLStripper instance has no attribute 'rawdata'

This is MLStripper:

from HTMLParser import HTMLParser

class MLStripper(HTMLParser):
    def __init__(self):
        set()
        self.fed = []
    def handle_data(self, d):
        self.fed.append(d)
    def get_data(self):
        return ''.join(self.fed)

def strip_tags(html):
    s = MLStripper()
    s.feed(html)
    return s.get_data()

MLStripper was working fine until yesterday.

And these are my other questions:

https://stackoverflow.com/questions/8152141/how-to-fix-this-attributeerror-with-htmlparser-py

https://stackoverflow.com/questions/8153300/how-to-fix-a-corrupted-pyshell-py


Solution

  • There are one or two issues with the code you posted (mainly to do with initializing the HTMLParser properly).

    Try running this amended version of your script:

    from HTMLParser import HTMLParser
    
    class MLStripper(HTMLParser):
        def __init__(self):
            # initialize the base class
            super(MLStripper, self).__init__()
    
        def read(self, data):
            # clear the current output before re-use
            self._lines = []
            # re-set the parser's state before re-use
            self.reset()
            self.feed(data)
            return ''.join(self._lines)
    
        def handle_data(self, d):
            self._lines.append(d)
    
    def strip_tags(html):
        s = MLStripper()
        return s.read(html)
    
    html = """Python's <code>easy_install</code>
     makes installing new packages extremely convenient.
     However, as far as I can tell, it doesn't implement
     the other common features of a dependency manager -
     listing and removing installed packages."""
    
    print(strip_tags(html))