pythonpython-2.7tarfile

Python 'tarfile' module unable to handle non utf-8 filenames in tar file


I am using Python 2.7 with 'tarfile' module, the tar file I am processing has filenames that are not unicode, and 'tarfile' module is erroring out. How do I tell the 'tarfile' model to ignore the errors? The code below gives me the error:

UnicodeDecodeError: 'utf8' codec can't decode byte 0xa2 in position 5: invalid start byte
if tar_flag:
    tar_obj = tarfile.open(infile, 'r', encoding='utf-8', errors='surrogateescape')
    try:
        member_list = tar_obj.getmembers()
        for member in member_list:
            if member.isfile():
                if member.name.lower().endswith('.exe'):
                    tar_obj.extract(member.name, path='/home/tar/')
                    print 'extracting {}'.format(member.name)
    except Exception as err:
        print err

Solution

  • Python 2.x does not support UTF, so I switched to Python 3 and this worked.