pythonnatsort

How to use natsort in python to sort folder names?


I have three folders, which names are ["-folder2-", "-folder1-", "=Folder-"].

When i use 'sorted' or in Window, it returns ["-folder-", "-folder1-", "-folder2-"]. But using natsort, it returns ["-folder1-", "-folder2-", "-folder-"].

I want to get same result by using natsort How can i do it?

a = ["-folder1-", "-folder2-", "-folder-"]
import natsort
sorting = natsort.natsorted(a, alg = natsort.ns.PATH | natsort.ns.LOCALE | natsort.ns.IGNORECASE)
print(sorted(a)) #---> ["-folder-", "-folder1-", "-folder2-"]
print(sorting) #---> ["-folder1-", "-folder2-", "-folder-"]

Solution

  • Before I answer your question, first I want to explain what is going on. natsort is looking for numbers in your input and separating them out from the non-numeric components. The easiest way to see this is by looking at the output of the natural sorting key. (I omitted the PATH and LOCALE options because they completely mangle the output).

    >>> import natsort
    >>> ns_key = natsort.natsort_keygen(alg=natsort.IGNORECASE)
    >>> a = ["-folder1-", "-folder2-", "-folder-"]
    >>> [ns_key(x) for x in a]
    [('-folder', 1, '-'), ('-folder', 2, '-'), ('-folder-',)]
    

    When '-folder' is compared against '-folder-', the former is considered to be first according to Python's sorting heuristics, so your folders with numbers get placed first.

    To answer your question, we need to trick natsort into thinking that '-' followed by no numbers should be treated like the case with numbers. One way to do that is with regex.

    >>> import re
    >>> r = re.compile(r"(?<!\d)-")
    >>> # What does the regex do?
    >>> [r.sub("0\g<0>", x) for x in a]
    ['0-folder1-', '0-folder2-', '0-folder0-']
    >>> # What does natsort generate?
    >>> ns_key = natsort.natsort_keygen(key=lambda x: r.sub("0\g<0>", x), alg=natsort.IGNORECASE)
    >>> [ns_key(x) for x in a]
    [('', 0, '-folder', 1, '-'), ('', 0, '-folder', 2, '-'), ('', 0, '-folder', 0, '-')]
    >>> # Does it actually work?
    >>> natsort.natsorted(a, key=lambda x: r.sub("0\g<0>", x), alg=natsort.ns.PATH | natsort.ns.LOCALE | natsort.ns.IGNORECASE)
    ['-folder-', '-folder1-', '-folder2-']
    

    An alternative method would be to "split" your input on '-', which would have a similar effect. This is one of the things that PATH under the hood, but for file separators.

    >>> # What does natsort generate?
    >>> ns_key = natsort.natsort_keygen(key = lambda x: x.split('-'), alg=natsort.IGNORECASE)
    >>> [ns_key(x) for x in a]
    [((), ('folder', 1), ()), ((), ('folder', 2), ()), ((), ('folder',), ())]
    >>> # Does it actually work?
    >>> natsort.natsorted(a, key=lambda x: x.split('-'), alg=natsort.ns.PATH | natsort.ns.LOCALE | natsort.ns.IGNORECASE)
    ['-folder-', '-folder1-', '-folder2-']
    

    You may be wondering why PATH does not automatically take care of this. PATH was intended to handle oddities that arise because of file separators or file extensions. Your examples have neither, so it does not help. If the examples given here are representative, I would recommend removing the PATH option since it will only add runtime but give no benefit.