Given this input
df = pd.DataFrame(
[["1 (2)", "4"], ["5 (3)", "2"], ["4 (2)", "4"], ["1 (1)", "4"], ["1 (2)", "3"]],
columns=["a", "b"],
)
a b
0 1 (2) 4
1 5 (3) 2
2 4 (2) 4
3 1 (1) 4
4 1 (2) 3
to arrive at this output
expected = pd.DataFrame(
[["1 (2)", "4", 2], ["5 (3)", "2", 0], ["4 (2)", "4", 1], ["1 (1)", "4", 4], ["1 (2)", "3", 3]],
columns=["a", "b", "c"],
)
a b c
0 1 (2) 4 2
1 5 (3) 2 0
2 4 (2) 4 1
3 1 (1) 4 4
4 1 (2) 3 3
Where the c column contains the order of the items in column a which is natsorted. I need to do this:
indexes_human_sorted = natsort.index_humansorted(df["a"], reverse=True) # [1, 2, 0, 4, 3] gets the values in order x[1], x[2], x[0] etc.
for i, x in enumerate(indexes_human_sorted):
df.loc[x, "c"] = i
a b c
0 1 (2) 4 2.0
1 5 (3) 2 0.0
2 4 (2) 4 1.0
3 1 (1) 4 4.0
4 1 (2) 3 3.0
I believe that loop (which is slow) over the indexes should be unnecessary and I think I'm missing some trick from natsort.
you can use np.argsort
I believe:
df['c'] = np.argsort(natsort.index_humansorted(df['a'],reverse=True))
print(df)
a b c
0 1 (2) 4 2
1 5 (3) 2 0
2 4 (2) 4 1
3 1 (1) 4 4
4 1 (2) 3 3