pythontarfile

Getting a single file from a tar file using the tarfile lib in python


I am trying to grab a single file from a tar archive. I have the tarfile library and I can do things like find the file in a list with the right extension:

like their example:

def xml_member_files(self,members): 
    for tarinfo in members:
        if os.path.splitext(tarinfo.name)[1] == ".xml":
            yield tarinfo


    member_file = self.xml_member_files(tar)
    for m in member_file:           
        print m.name

This is great and the output is:

RS2_C0RS2_OK67683_PK618800_DK549742_SLA23_20151006_234046_HH_SLC/lutBeta.xml
RS2_C0RS2_OK67683_PK618800_DK549742_SLA23_20151006_234046_HH_SLC/lutGamma.xml
RS2_C0RS2_OK67683_PK618800_DK549742_SLA23_20151006_234046_HH_SLC/lutSigma.xml
RS2_C0RS2_OK67683_PK618800_DK549742_SLA23_20151006_234046_HH_SLC/product.xml

If I say just look for product.xml then it doesn't work. So I tried this:

    ti = tar.getmember('product.xml')
    print ti.name

and it doesn't find product.xml because I am guessing the path information before hand. I have no idea how to retrieve just that pathing information so I can get at my product.xml file once extracted (feels like I am doing things the hard way anyway) but yah, how do I figure out just that path so I can concatenate it to my other file functions to read and load that xml file after it is the only file extracted from a tar file?


Solution

  • Return full path by iterating over result of getnames(). For example, to get full path for lutBeta.xml:

    tar = tarfile.TarFile('mytarfile.tar')
    membername = [x for x in tar.getnames() if os.path.basename(x) == 'lutBeta.xml'][0]