I have a dataset in a specific root and trying to iterate through dirs and files, but the topdown
parameter does not work as expected.
Images
├── n01440764
│ ├── image1.jpg
│ ├─ ...
│ └── image50.jpg
└── n01443537
├── image1.jpg
├─ ...
└── image50.jpg
import os
image_dir = os.walk("Images", topdown=False)
for root, dirs, files in image_dir:
for d in dirs:
print(d)
Output:
n01443537
n01440764
Either topdown=False
or True
, the result is the same. But I am expecting:
n01440764
n01443537
The topdown
option does not change the order in which directories at the same level are walked. Instead, it determines whether the (dirpath, dirnames, filenames) tuples for a directory are generated before or after the tuples for its subdirectories. In the default or True option, you can modify dirnames in place to filter out some paths and not walk them. In the False option, the subdirectories are walked first, so that sort of filtering is not possible.
From the official documentation for os.walk()
If optional argument topdown is True or not specified, the triple for a directory is generated before the triples for any of its subdirectories (directories are generated top-down). If topdown is False, the triple for a directory is generated after the triples for all of its subdirectories (directories are generated bottom-up). No matter the value of topdown, the list of subdirectories is retrieved before the tuples for the directory and its subdirectories are generated.
When topdown is True, the caller can modify the dirnames list in-place (perhaps using del or slice assignment), and walk() will only recurse into the subdirectories whose names remain in dirnames; this can be used to prune the search, impose a specific order of visiting, or even to inform walk() about directories the caller creates or renames before it resumes walk() again. Modifying dirnames when topdown is False has no effect on the behavior of the walk, because in bottom-up mode the directories in dirnames are generated before dirpath itself is generated.
As of Python 3.5, os.walk() uses os.scandir(), which returns them in an arbitrary order:
The list is in arbitrary order, and does not include the special entries '.' and '..' even if they are present in the directory
Prior to Python 3.5, it used os.listdir(), which also returned the directories in arbitrary order. See also the answer to this question: In what order does os.walk iterates iterate?
You can get the directories in the order you want by using
for dir in sorted(dirs)
within your os.walk() loop.