pythonpython-3.xsortingos.walk

How to sort os.walk(path) in alphanumeric order, with duplicates coming after the original file, using Python 3?


In python 3 (specifically 3.10.6), how can you change the way that os.walk(path) sorts the files it finds? Given this list of files:

IMG0001.jpg
IMG0002.jpg
IMG0002(1).jpg
IMG0002(2).jpg
IMG0003.jpg

How would you sort it in that order, with each (n) duplicate file coming after the original file? Currently, os.walk(path) is sorting this list like this:

IMG0001.jpg
IMG0002(1).jpg
IMG0002(2).jpg
IMG0002.jpg
IMG0003.jpg

I suppose the main issue is that the default sort method is giving a higher "sort value" to the ( (and also -) than it is to the . in the extension. If this is correct in what's happening here, how would you modify which special characters come before others?

I've tried to use sorted(files), however that sorts it the same as os.walk(path) already sorts them. If I try sorted(files, reverse=True), then while the originals come before the duplicates, multiple duplicates are now sorted backward and all the originals are backward too, ie:

IMG0003.jpg
IMG0002.jpg
IMG0002(2).jpg
IMG0002(1).jpg
IMG0001.jpg

Solution

  • String ordering is lexicographic, so you'll need a custom sort key if you want something different. It's a little trickier than expected, but something like this should work:

    import os
    import re
    
    def key(fname):
        basename, ext = os.path.splitext(fname)
        v = 0
        if m := re.match(r"(.*)\((\d+)\)$", basename):
            basename, v = m.groups()
            v = int(v)
        return basename, ext, v
    

    Now you should be able to use something like files.sort(key=key).