pythonlistcountshutilos.walk

Print number of occurrences of any items in a list in paths


I am using os.walk to identify paths in a generic source directory (SRC) that contain any strings in my_list:

SRC = '/User/dir_1/'

my_list = ["dog", "cat", "mouse", "bird"]

for dirpath, dirnames, filenames in os.walk(SRC):
    for folders in dirnames:
        for x in my_list:
            if x in folders:
                source_path = os.path.join(dirpath, folders)

And let's say that print(source_path) gives the following:

/User/dir_1/cat_test/
/User/dir_1/cat_test/bird_results/
/User/dir_1/dir_2/dog_test/
/User/dir_1/dir_2/dog_test/cat_results/
/User/dir_1/mouse_test/
/User/dir_1/mouse_test/mouse_results/
/User/dir_1/unknown_test/dog_results/
/User/dir_1/bird_files/
/User/dir_1/bird_files/bird_a_files/
/User/dir_1/bird_files/bird_b_files/

My goal is to shutil.move my source_path's, but since, for example, moving /User/dir_1/bird_files/ and then trying to move /User/dir_1/bird_files/bird_a_files/ will result in a FileNotFound Error, I want to filter my source_path's to include those with only 1 occurrence of any string in my_list, such that my source_path's are:

/User/dir_1/cat_test/
/User/dir_1/dir_2/dog_test/
/User/dir_1/mouse_test/
/User/dir_1/unknown_test/dog_results/
/User/dir_1/bird_files/

I have tried source_path.count(x) == 1, but that iterates through my_list as opposed to counting any x in my_list, such that my output is (for example):

/User/dir_1/dir_2/dog_test/cat_results/ count == 1 (for dog)
/User/dir_1/dir_2/dog_test/cat_results/ count == 1 (for cat)
/User/dir_1/dir_2/dog_test/cat_results/ count == 0 (for mouse)
/User/dir_1/dir_2/dog_test/cat_results/ count == 0 (for bird)

but I want to see (for example):

/User/dir_1/dir_2/dog_test/cat_results/ count == 2 (for any x in my_list)

Which would allow me to filter out any source_path with count != 1


Solution

  • Use a comprehension to filter by count, then sum the result (True is cast to 1) to get the "any" behavior.

    paths = """/User/dir_1/cat_test/
    /User/dir_1/cat_test/bird_results/
    /User/dir_1/dir_2/dog_test/
    /User/dir_1/dir_2/dog_test/cat_results/
    /User/dir_1/mouse_test/
    /User/dir_1/mouse_test/mouse_results/
    /User/dir_1/unknown_test/dog_results/
    /User/dir_1/bird_files/
    /User/dir_1/bird_files/bird_a_files/
    /User/dir_1/bird_files/bird_b_files/""".split()
    
    
    my_list = ["dog", "cat", "mouse", "bird"]
    
    out = []
    for path in paths:
        if sum(True for term in my_list if path.count(term) == 1) == 1:
            out.append(path)
    
    print(*out, sep='\n')
    

    Output

    /User/dir_1/cat_test/
    /User/dir_1/dir_2/dog_test/
    /User/dir_1/mouse_test/
    /User/dir_1/unknown_test/dog_results/
    /User/dir_1/bird_files/
    

    EDIT: From the comment, a os.walk approach.

    Idea: remove terms from the dirnames parameter

    Remark: I used as filtering condition (see comment in the code) the method substring is contained in string which is quite poor. In this special case a more robust one could be d.startswith(c). For more flexibility use a regex-like solution.

    import os
    
    
    constraints = 'dog', 'cat', 'mouse', 'bird'
    
    wdir = './User' # your reference directory
    res = []
    for path, dirs, _ in os.walk(wdir, topdown=True):
        # local to each directory's content
        counter = dict.fromkeys(constraints, False)
        dirs_to_skip = []
        
        # filter by constraint
        for c in constraints:
            for d in dirs:
                if c in d: # <-- filter condition!
                    if not counter[c]: # 1st match
                        counter[c] = True
                        res.append(os.path.join(path, d))
    
                    dirs_to_skip.append(d)
        
        # remove matched paths          
        for d in dirs_to_skip:
            dirs.remove(d)
    
    print(*res, sep='\n')