I am using Python 2.7 with 'tarfile' module, the tar file I am processing has filenames that are not unicode, and 'tarfile' module is erroring out. How do I tell the 'tarfile' model to ignore the errors? The code below gives me the error:
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa2 in position 5: invalid start byte
if tar_flag:
tar_obj = tarfile.open(infile, 'r', encoding='utf-8', errors='surrogateescape')
try:
member_list = tar_obj.getmembers()
for member in member_list:
if member.isfile():
if member.name.lower().endswith('.exe'):
tar_obj.extract(member.name, path='/home/tar/')
print 'extracting {}'.format(member.name)
except Exception as err:
print err
Python 2.x does not support UTF, so I switched to Python 3 and this worked.