pythonpathextractunziptarfile

Extract only jpg files from a .tar.gz file using python


Problem Summary: In one of my folder I have .tar.gz file and I need to extract all the images (.jpg & .png) from it. But I have to use the .tar.gz extension (using path to directory) to extract it and not by using the usual way of giving the input file_name to extract it. I need this in one of the part of GUI (Tkinter) for the image classification project.

Code I'm trying:

import os
import tarfile

def extractfile():
    os.chdir('GUI_Tkinter/PMC_downloads')
    with tarfile.open(os.path.join(os.environ['GUI_Tkinter/PMC_downloads'], f'Backup_{self.batch_id}.tar.gz'), "r:gz") as so:
        so.extractall(path=os.environ['GUI_Tkinter/PMC_downloads'])

The code is not giving any error but it's not working. Please suggest me how to do the same by any other way by specifying the .tar.gz file extension to extract it.


Solution

  • Generic/dynamic way to extract one or more .tar.gz or zip file present in a folder without specifying the file name. This is executed by using the extension and the path (location) of the file. You can extract any type of file (.pdf, .nxml, .xml, .gif, etc.) you want from the .tar.gz/zip/compressed file just by mentioning the extension of the required file as the member name in this code. As, I needed all the images from that .tar.gz file to be extracted in one folder. So, in the code below I have specified the extensions .jpg and .png and extracted all the images in the same directory under a folder named "Extracted_Images". If you want, you can also change the directory where the files needed to be extracted by providing the path parameter.

    For example "C:/Users/dell/project/histo_images" instead of "Extracted_Images".

    
    import tarfile
    import os
    import glob
    
    path = glob.glob("*.tar.gz")
    
    for file in path:
        t = tarfile.open(file, 'r')
        for member in t.getmembers():
            if ".jpg" in member.name:
                t.extract(member, "Extracted_Images")
            elif ".png" in member.name:
                t.extract(member, "Extracted_Images")