I am iterating through files in a directory using Path().glob() and it's not iterating in the correct natural ordering. For eg. it's iterating like this:
[WindowsPath('C:/Users/HP/Desktop/P1/dataP1/SAMPLED_NORMALIZED/P1_Cor.csv'),
WindowsPath('C:/Users/HP/Desktop/P10/dataP10/SAMPLED_NORMALIZED/P10_Cor.csv'),
WindowsPath('C:/Users/HP/Desktop/P11/dataP11/SAMPLED_NORMALIZED/P11_Cor.csv'),
WindowsPath('C:/Users/HP/Desktop/P12/dataP12/SAMPLED_NORMALIZED/P12_Cor.csv'),
# ...and so on from P1 to P30
When I want it to iterate like this: P1, P2, P3 and so on.
I have tried using the code below but it gives me an error:
from pathlib import Path
file_path = r'C:/Users/HP/Desktop'
files = Path(file_path).glob(file)
sorted(files, key=lambda name: int(name[10:]))
where 10 is just some trivial number as I am trying out the code.
The error:
TypeError: 'WindowsPath' object is not subscriptable
Ultimately, what I want is to iterate through the files and do something with each file:
from pathlib import Path
for i, fl in enumerate(Path(file_path).glob(file)):
# do something
I have even tried the library natsort
but it's not ordering the files correctly in the iteration. I have tried:
from natsort import natsort_keygen, ns
natsort_key1 = natsort_keygen(key=lambda y: y.lower())
from natsort import natsort_keygen, ns
natsort_key2 = natsort_keygen(alg=ns.IGNORECASE)
The two codes above still gives me P1, P10, P11 and so on.
Any help would really be appreciated.
If you want to sort by the digits in the file name, you can use the Path.name
attribute and a regular expression that extracts the digits.
from pathlib import Path
import re
file_path = r'C:/Users/HP/Desktop/P1/dataP1/SAMPLED_NORMALIZED/'
def _p_file_sort_key(file_path):
"""Given a file in the form P(digits)_cor.csv, return digits as an int"""
return int(re.match(r"P(\d+)", file_path.name).group(1))
files = sorted(Path(file_path).glob("P*_Cor.csv"), key=_p_file_sort_key)