pythonrecursion

Convert a python non-tail recursive function adding sub-folders info to a loop


I wrote a function to get the total size and total files/subdirs of the entire disk or specific folder. The recursive function was very simple:

# Make the python dirSize function a non recursive one:
from pathlib import Path

obj = {}
def dirSize(dir: Path, size = 0, files = 0, dirs = 0):
    for sub_dir in dir.glob('*'):
        if sub_dir.is_dir():
            dirs += 1
            res_size, res_files, res_dirs = dirSize(sub_dir)
            size += res_size
            files += res_files
            dirs += res_dirs
        elif sub_dir.is_file():
            files += 1
            size += sub_dir.stat().st_size
    
    obj[dir.as_posix()] = {
        'size': size,
        'files': files,
        'dirs': dirs,
    }
    
    return size, files, dirs

dirSize(Path('~/Documents').expanduser())
#Line above is for tests. Final code should run in elevated cmd:
#dirSize(Path('C:/'))

with open(Path('~/Desktop/results.tsv').expanduser().as_posix(), 'w') as file:
    file.write('Directory\tBytes\tFiles\tDirs\n')
    for key in sorted(obj):
        dir = obj[key]
        file.write(key)
        file.write('\t')
        file.write(str(dir['size']))
        file.write('\t')
        file.write(str(dir['files']))
        file.write('\t')
        file.write(str(dir['dirs']))
        file.write('\n')

I tried to use AI to convert it to a non recursive function, but AI seems to ignore these lines are a bottom up sum, not a simple sum:

size += res_size
files += res_files
dirs += res_dirs

So I tried a stack loop, like in the Dijkstra algorithm, but it turns up that I couldn't find a simple solution to make the bottom up sum. Alternatively I could add level/parent/child properties to the obj variable to make a bottom up sum after the queue processing. But this seems very cumbersome. Is there a simpler way to achieve this?




Replying questions asked below:

  1. Why don't you just use os.walk?
  1. What's the problem with the recursive function?

Solution

  • Your thought to use a stack approach was right - this can be achieved by creating a stack with tuple entries. Each tuple holds the subdirectory to process, as well as the name of the parent directory, so that the parent's stats can be updated accordingly. In order to update the parent's stats after all of its children's stats have been computed, we can add the parent entry back to the stack to be processed a second time. Here's an example iterative approach:

    obj = {}
    
    def dirSizeIterative(dir: Path):
        dir_stack = [(dir, 'root')]
        obj['root'] = {'size': 0, 'files': 0, 'dirs': 0}
    
        while dir_stack:
            sub_dir, parent_dir_name = dir_stack.pop(0)
            parent_info = obj[parent_dir_name]
    
            if sub_dir.is_dir():
                # if this exists, then we are processing this dir for the second time, after children have all been computed
                if sub_info := obj.get(sub_dir.as_posix()):
                    parent_info['size'] += sub_info['size']
                    parent_info['files'] += sub_info['files']
                    parent_info['dirs'] += sub_info['dirs']
    
                    continue
    
                sub_info = {'size': 0, 'files': 0, 'dirs': 0}
                obj[sub_dir.as_posix()] = sub_info
    
                parent_info['dirs'] += 1
    
                # add this dir back to the stack for processing after children have been computed
                dir_stack.insert(0, (sub_dir, parent_dir_name))
    
                for child in sub_dir.glob('*'):
                    dir_stack.insert(0, (child, sub_dir.as_posix()))
            elif sub_dir.is_file():
                parent_info['files'] += 1
                parent_info['size'] += sub_dir.stat().st_size
    
        del obj['root']
    
    dirSizeIterative(Path('~/Documents').expanduser())