I have this data in pandas
data = [
['ID', 'Time', 'oneMissing', 'singleValue', 'empty', 'oneEmpty'],
['CS1-1', 1, 10000, None, None, 0],
['CS1-2', 2, 20000, 0.0, None, 0],
['CS1-1', 2, 30000, None, None, 0],
['CS1-2', 1, 10000, None, None, None],
['CS1-11', 1, None, 0.0, None, None],
['CS1-2', 3, 30000, None, None, None]
]
that I try to sort by ID and Time columns so the result should be like
'CS1-1', 1, 10000, None, None, 0
'CS1-1', 2, 30000, None, None, 0
'CS1-2', 1, 10000, None, None, None
'CS1-2', 2, 20000, 0.0, None, 0
'CS1-2', 3, 30000, None, None, None
'CS1-11', 1, None, 0.0, None, None
]
I'm using pandas dataframe for the sorting, also tried together with natsort, but I can't get it to work. Either I get errors that the index includes duplicates (I use ID as the index) or it sorts by string values.
The ID here is just an example. I don't know what format it will be, it might be NUMBER-LETTER or NUMBER LETTER NUMBER. I just need to compare all numbers as a number. I've looked at "natsort" and that seems to do correct for an array. So I think it should be possible to use that to sort the ID and then re-index the data.
I've looked at multiple sources like these, but without any luck: Alphanumeric sorting Sort dataframes
Use str.extract
, sort_values
, then use the index to reindex df
.
idx = (df.assign(ID2=df.ID.str.extract(r'(\d+)$').astype(int))
.sort_values(['ID2', 'Time'])
.index)
df.iloc[idx]
ID Time oneMissing singleValue empty oneEmpty
0 CS1-1 1 10000.0 NaN None 0.0
2 CS1-1 2 30000.0 NaN None 0.0
3 CS1-2 1 10000.0 NaN None NaN
1 CS1-2 2 20000.0 0.0 None 0.0
5 CS1-2 3 30000.0 NaN None NaN
4 CS1-11 1 NaN 0.0 None NaN
This is under the assumption that your ID column follows the pattern "XXX-NUMBER".
A fool-proof solution will involve the use of the natsort
module, which excels at fast natural sorting. With a little elbow-grease, we can argsort your data.
from natsort import natsorted
idx, *_ = zip(*natsorted(
zip(df.index, df.ID, df.Time), key=lambda x: (x[1], x[2])))
df.iloc[list(idx)]
ID Time oneMissing singleValue empty oneEmpty
0 CS1-1 1 10000.0 NaN None 0.0
2 CS1-1 2 30000.0 NaN None 0.0
3 CS1-2 1 10000.0 NaN None NaN
1 CS1-2 2 20000.0 0.0 None 0.0
5 CS1-2 3 30000.0 NaN None NaN
4 CS1-11 1 NaN 0.0 None NaN
Use PyPi to install: pip install natsort
.