pythontarfile

Python Renaming tar file contents without extracting


I have a TAR file that is created on one machine and is extracted on another machine. Recently I have learned that some coworkers are opening the tar in 7-zip and renaming some of the folders. As an example the created tar my_tar.tar has:

├── dir1
│   ├── sub-dir1
│   │    ├── file1
│   │    ├── file2
│   ├── sub-dir2
│   │    ├── file3
│   ├── file4
├── dir2
│   ├── file5
├── dir3
│   ├── file6
│   ├── file7

Previously they were placing the TAR on a RHEL machine in their home directory /home/user1/my_tar.tar. then extracting it in that location and moving sub_dir1, sub_dir2, and file4 to /dir1.

To avoid having to do that manual move on the target RHEL machine they started taking the file, opening it in 7-zip, right-clicking on dir1, selecting rename, and putting a '/' in front (making it now say /dir1). Then moving the TAR to their home directory /home/user1/my_tar.tar and extracting it.

The TAR is created via a python script and I am trying to do that rename process through scripting. Right now I have dir2 and dir3 (with their sub-elements) successfully added to the tar via

# tar_file_path is a local path the tar file should be created at
with tarfile.open(tar_file_path, "w", format=tarfile.GNU_FORMAT) as tfh:
    # unmod_path is a Path to the directory containing dir2 and dir3
    for sub_path in unmod_path.iterdir():
        tfh.add(sub_path, arcname=sub_path.name)

I have looked at the related questions and hav not found a solution that does not involve extracting files to a different location. I am specifically looking for a means of doing the equivalent of the 7-zip rename as part of building the TAR or after the tar has been created.

I've tried adding the following loop into my with statement above. Each of the commented out lines in the loop represent different forms of the add I have tried.

    # mod_path is a Path to the directory containing dir1
    for sub_path in mod_path.iterdir():
        # tfh.add(sub_path, arcname=f"/{sub_path.name}")  # 1
        # tfh.add(sub_path, arcname=f"//{sub_path.name}")  # 2
        # tfh.add(sub_path, arcname=f"\\{sub_path.name}")  # 3
        # tfh.add(sub_path, arcname=f"/\\{sub_path.name}")  # 4
        # tfh.add(sub_path, arcname=f"//\\{sub_path.name}")  # 5
        # tfh.add(sub_path, arcname=f"\\/{sub_path.name}")  # 6
        # tfh.add(sub_path, arcname=f"\\//{sub_path.name}")  # 7
        pass

None of them have worked. Some resulted in no change while others resulted in an empty directory in the tar with no name, an empty directory named dir1, and both sub_dir1 and sub_dir2 at the same level as dir2 and dir3.

I have also tried to create TarInfo objects (using the gettarinfo from tarfile) and adding them to the tar for each individual file (by using the addfile from tarfile) for all files under the dir1 directory (file1, file2, file3, and file4). This most commonly lead to it saying all files were added but extracting the TAR led to errors on the file that was put in the tar with the file appearing to be corrupted when opened with Notepad++.


Solution

  • The answer to this is to use the filter parameter of the add method. By having a function that takes the TarInfo and modifies the name to put the "/" in front it works. The code is as follows:

    # mod_path is a Path to the directory containing dir1
    def adjust_name(info: tarfile.TarInfo) -> tarfile.TarInfo:
        info.name = "/" + info.name
        return info
    
    for sub_path in mod_path.iterdir():
        tfh.add(sub_path, arcname=sub_path.name, filter=adjust_name)