pythonbz2

EOFError: compressed file ended before the logical end-of-stream was detected in decompressing bz2 file


I get this error when I try to decompress wikipedia dump to use its .xml file. How can I solve it?

filepath='/Data/nlp/ESA/Wiki-ESA-master'
file_name='enwiki-latest-pages-articles.xml.bz2'
zipfile = bz2.BZ2File(file_name) # open the file
DEFAULT_FILENAME = zipfile.read() # get the decompressed data

error:

EOFError: compressed file ended before the logical end-of-stream was detected

Solution

  • As the error says, the downloading process most likely ended prematurely and you have a truncated file. Try downloading again.

    Another reason may be a corrupted data on your disk. Downloading again may help with this too.