pythonoptimizationpython-os

find sub folders which contain images


What is the most efficent way to get path of subfolders which contain files. For example, if this is my input structure.

inputFolder    
│
└───subFolder1
│   │
│   └───subfolder11
│       │   file1.jpg
│       │   file2.jpg
│       │   ...
│   
└───folder2
    │   file021.jpg
    │   file022.jpg

If I pass getFolders(inputPath), it should return the output as a list of folders containig images ['inputFolder/subFolder1/subFolder11','inputFolder/folder2']

Currently I'm making use of my library TreeHandler, which is just a wrapper of os.walk to get all the files.

import os
from treeHandler import treeHandler
th=treeHandler()
tempImageList=th.getFiles(path,['jpg'])
### basically tempImageList will be list of path of all files with '.jpg' extension

### now is the filtering part,the line which requires optimisation.
subFolderList=list(set(list(map(lambda x:os.path.join(*x.split('/')[:-1]),tempImageList))))

I think it can be done more efficiently.

Thanks in advance


Solution

  • Adding the code used for verification

    import os
    from treeHandler import treeHandler
    import time
    
    def remove_tail(path):
        index = path.rfind('/')
        return (path[:index] if index != -1  else '.')
    
    th=treeHandler()
    tempImageList= th.getFiles('JPEGImages',['jpg'])
    tempImageList = tempImageList
    ### basically tempImageList will be list of path of all files with '.jpg' extension
    
    ### now is the filtering part,the line which requires optimisation.
    print(len(tempImageList))
    start = time.time()
    originalSubFolderList=list(set(list(map(lambda x:os.path.join(*x.split('/')[:-1]),tempImageList))))
    print("Current method takes", time.time() - start)
    
    start = time.time()
    newSubFolderList = list(set([remove_tail(path) for path in tempImageList]))
    print("New method takes", time.time() - start)
    
    print("Is outputs matching: ", originalSubFolderList == newSubFolderList)