[SOLVED] How to sort 'WindowsPath' object files naturally

How to sort 'WindowsPath' object files naturally

I am iterating through files in a directory using Path().glob() and it's not iterating in the correct natural ordering. For eg. it's iterating like this:

[WindowsPath('C:/Users/HP/Desktop/P1/dataP1/SAMPLED_NORMALIZED/P1_Cor.csv'),
 WindowsPath('C:/Users/HP/Desktop/P10/dataP10/SAMPLED_NORMALIZED/P10_Cor.csv'),
 WindowsPath('C:/Users/HP/Desktop/P11/dataP11/SAMPLED_NORMALIZED/P11_Cor.csv'),
 WindowsPath('C:/Users/HP/Desktop/P12/dataP12/SAMPLED_NORMALIZED/P12_Cor.csv'),
# ...and so on from P1 to P30

When I want it to iterate like this: P1, P2, P3 and so on.

I have tried using the code below but it gives me an error:

from pathlib import Path

file_path = r'C:/Users/HP/Desktop'

files = Path(file_path).glob(file)
sorted(files, key=lambda name: int(name[10:]))

where 10 is just some trivial number as I am trying out the code.

The error:

TypeError: 'WindowsPath' object is not subscriptable

Ultimately, what I want is to iterate through the files and do something with each file:

from pathlib import Path

for i, fl in enumerate(Path(file_path).glob(file)):
    # do something

I have even tried the library natsort but it's not ordering the files correctly in the iteration. I have tried:

from natsort import natsort_keygen, ns
natsort_key1 = natsort_keygen(key=lambda y: y.lower())

from natsort import natsort_keygen, ns
natsort_key2 = natsort_keygen(alg=ns.IGNORECASE)

The two codes above still gives me P1, P10, P11 and so on.

Any help would really be appreciated.

Solution

If you want to sort by the digits in the file name, you can use the Path.name attribute and a regular expression that extracts the digits.

from pathlib import Path
import re

file_path = r'C:/Users/HP/Desktop/P1/dataP1/SAMPLED_NORMALIZED/'

def _p_file_sort_key(file_path):
    """Given a file in the form P(digits)_cor.csv, return digits as an int"""
    return int(re.match(r"P(\d+)", file_path.name).group(1))

files = sorted(Path(file_path).glob("P*_Cor.csv"), key=_p_file_sort_key)