pandasdataframesorting

Pandas Persistent Sorting


I have a text file with 8 columns

# 0.y[m] 1.uy[m_e*c] 2.x[m] 3.ux[m_ec] 4.z[m] 5.uz[m_ec] 6.w 7.ID.

In this file, the first line is commented with # and there is no label for the data columns. I import this file in a data frame using

data = pd.DataFrame(pd.read_csv("file.txt", sep='\t', comment='#', header=None));

and diplaying it I see:

enter image description here

Now, I need to sort ascending for the last column so I use index 7 and this code line

dataS = data.sort_values(7);

and sorting works because now I see

enter image description here

but sorting is not persistent because

data[7][0] = 24641553

and

dataS[7][0] = 24641553

I need to use the sorted data frame one row after the other, exactly in the sorted order, so my code will rely on a for loop which uses dataS[7][i] where i = 0, 1, 2, ...

Code is below.

import pandas as pd;
data = pd.DataFrame(pd.read_csv("file.txt", sep='\t', comment='#', header=None));
dataS = data.sort_values(7);

Sample text file looks like this:

#0.y[m] 1.uy[m_e*c] 2.x[m] 3.ux[m_ec] 4.z[m] 5.uz[m_ec] 6.w 7.ID
4.800773e-06    5.825619e+00    9.693396e-06    1.732705e+00    1.068944e-05    -3.532225e+00   1.255580e+04    24641553
4.359847e-06    1.275340e+01    9.564333e-06    -3.591681e-01   9.690643e-06    7.398885e+00    1.255580e+04    18676620

My problem is that I don't know how to cycle on the sorted data frame labeled by dataS here. Can anyone help please? Thanks!


Solution

  • The problem is not that the sorting isn't persistent, it's that it's also sorting the index:

    dataS = data.sort_values(7)
    
              0          1        2  ...         5        6         7
    1  0.000004  12.753400  0.00001  ...  7.398885  12555.8  18676620
    0  0.000005   5.825619  0.00001  ... -3.532225  12555.8  24641553
    

    If you want to sort only the values, but not the index, use ignore_index=True:

    dataS = data.sort_values(7, ignore_index=True)
    
              0          1        2  ...         5        6         7
    0  0.000004  12.753400  0.00001  ...  7.398885  12555.8  18676620
    1  0.000005   5.825619  0.00001  ... -3.532225  12555.8  24641553
    

    dataS[7][0] will output:

    18676620