pythonpython-3.xlinuxarchivepython-zipfile

Python zipfile does not unzip folders for windows zip archive


I have a zip file which was created on Windows machine using this tool System.IO.Compression.ZipFile (this zip archive contains many files and folders). I have a python code that runs on Linux machine (raspberry pi to be exact) which has to unzip the archive and create all the necessary folders and files. I'm using Python 3.5.0 and zipfile library, this is a sample code:

import zipfile

zip = zipfile.ZipFile("MyArchive.zip","r")
zip.extractall()
zip.close()

Now when I run this code instead of getting a nice unzipped directory tree, I get all the files in root directory with weird names like Folder1\Folder2\MyFile.txt.

My assumption is that since zip archive was created on Windows and directory separator on windows is \ whereas on Linux it is /, python zipfile library treats \ as part of a file name instead of directory separator. Also note that when I'm extracting this archive manually (not through python code) all the folder are created as expected, so it seems that this is definitely a problem of zipfile library. Another note is that for zip archives that where created with a different tool (not System.IO.Compression.ZipFile) it works OK using the same python code.

Any insight on what's going on and how to fix it?


Solution

  • What is happening is that while Windows recognizes both \ (path.sep) and / (path.altsep) as path separators, Linux only recognizes / (path.sep).

    As @blhsing's answer shows, the existing implementation of ZipFile always ensures that path.sep and / are considered valid separator characters. That means that on Linux, \ is treated as a literal part of the file name. To change that, you can set os.altsep to \, since it gets checked if it's not None or empty.

    If you go down the road of modifying ZipFile itself, like the other answer suggests, just add a line to blindly change \ to path.sep, since / is always changed already anyway. That way, /, \and possibly path.altsep will all be converted to path.sep. This is what the command line tool appears to be doing.