[SOLVED] list of entries (files and folders) in a directory tree by os.scandir() in Python

list of entries (files and folders) in a directory tree by os.scandir() in Python

I have used "os.walk()" to list all subfolders and files in a directory tree , but heard that "os.scandir()" does the job up to 2X - 20X faster. So I tried this code:

def tree2list (directory:str) -> list:
    import os
    tree = []
    counter = 0
    for i in os.scandir(directory):
        if i.is_dir():
            counter+=1
            tree.append ([counter,'Folder', i.name, i.path])  ## doesn't list the whole tree
            tree2list(i.path)
            #print(i.path)  ## this line prints all subfolders in the tree
        else:
            counter+=1
            tree.append([counter,'File', i.name, i.path])
            #print(i.path)  ## this line prints all files in the tree
    return tree

and when test it:

    ## tester
folder = 'E:/Test'
print(tree2list(folder))

I got only the content of the root directory and none from sub-directories below tree hierarchy, while all print statements in above code work fine.

[[1, 'Folder', 'Archive', 'E:/Test\\Archive'], [2, 'Folder', 'Source', 'E:/Test\\Source']]

What have I done wrong ?, and how can I fix it?!

Solution

Your code almost works, just a minor modification is required:

def tree2list(directory: str) -> list:
    import os
    tree = []
    counter = 0
    for i in os.scandir(directory):
        if i.is_dir():
            counter += 1
            tree.append([counter, 'Folder', i.name, i.path])
            tree.extend(tree2list(i.path))
            # print(i.path)  ## this line prints all subfolders in the tree
        else:
            counter += 1
            tree.append([counter, 'File', i.name, i.path])
            # print(i.path)  ## this line prints all files in the tree
    return tree

Although I don't understand what the purpose of the counter variable is, so I'd probably remove it.

Further, I have to agree with @Gelineau that your approach utilizes array-copies quite heavily and is therefore most likely quite slow. An iterator based approach as in his response is more suited for a large number of files.