file = BZ2File(SOME_FILE_PATH)
p = xml.parsers.expat.ParserCreate()
p.Parse(file)
Here's code that tries to parse xml file compressed with bz2. Unfortunately it fails with a message:
TypeError: Parse() argument 1 must be string or read-only buffer, not bz2.BZ2File
Is there a way to parse on the fly compressed bz2 xml files?
Note: p.Parse(file.read())
is not an option here. I want to parse a file which is larger than available memory, so I need to have a stream.
Just use p.ParseFile(file) instead of p.Parse(file).
Parse() takes a string, ParseFile() takes a file handle, and reads the data in as required.
Ref: http://docs.python.org/library/pyexpat.html#xml.parsers.expat.xmlparser.ParseFile