pythontarfile

How can I extract files to a different destination filename from tarfile in python?


I have a tarfile.TarFile from which I would like to extract some files to a modified destination filename; there is an existing file with the same name as the archive member that I do not want to touch. Specifically, I want to append a suffix, e.g. a member in the archive called foo/bar.txt should be extracted as foo/bar.txt.mysuffix.

The two somewhat obvious but also somewhat unsatisfactory approaches are:

Is there no interface or hook on the TarFile that would allow to implement this concisely and correctly?


Solution

  • Looking through Lib/tarfile.py, I came across this comment:

        #--------------------------------------------------------------------------
        # Below are the different file methods. They are called via
        # _extract_member() when extract() is called. They can be replaced in a
        # subclass to implement other functionality.
    
        def makedir(self, tarinfo, targetpath):
           #...
        
        def makefile(self, tarinfo, targetpath):
           # ...
    

    These methods are not mentioned in the official reference documentation, but they appear to be fair game. To overwrite these on an existing open TarFile instance, we can create a subclass Facade/Wrapper:

    class SuffixingTarFile(tarfile.TarFile):
        def __init__(self, suffix: str, wrapped: tarfile.TarFile):
            self.suffix = suffix
            self.wrapped = wrapped
    
        def __getattr__(self, attr):
            return getattr(self.wrapped, attr)
    
        def makefile(self, tarinfo, targetpath):
            super().makefile(tarinfo, targetpath + self.suffix)
    
        # overwrite makedir, makelink, makefifo, etc. as desired
    

    Example:

    tar = tarfile.open(...)
    star = SuffixingTarFile(".foo", tar)
    star.extractall()  # extracts all (regular) file members with .foo suffix appended