pythonmachine-learningdeep-learningimage-preprocessing

How to choose some specific images from a large dataset of images?


My dataset has 366 folders means folder for each day covering a duration of 1 year nd each folder containing 51-55 images, out of which I need only 36 images for building neural network. so can I assign some index to those images and select some based on their index..? Can someone suggest me code for doing so..?

folders in my dataset

images inside a folder


Solution

  • Every element on list has own number/index - so first you could create lists with all filenames.

    You can use os.listdir() to get all folders and later use os.listdir(folder) for every folder to get list with filenames in folder.

    import os
    
    base = '/home/furas/images/2021'
    
    folders = os.listdir(base)
    
    all_filenames = []
    
    for folder_name in folders:
    
        # add base path to folder name to have full path
        full_path = os.path.join(base, folder_name)
        print(full_path)
        
        # get filenames (without path) in folder
        filenames = os.listdir(full_path)
        
        # add path to filenames
        filenames = [os.path.join(full_path, name) for name in filenames]
       
        all_filenames.append(filenames)
        
    print(all_filenames)    
    

    This way you have 2D list with all filenames (with full path) and you can select them.

    first 36 images in some day

    selected = all_filenames[day_index][:36]
    

    first 36 images in every days

    selected = []
    
    for day in all_filenames:
        selected.append( day[:36] )
    

    random 36 images in every day

    import random
    
    selected = []
    
    for day in all_filenames:
        selected.append( random.choices(day, 36) )