pythonpandas

Pandas dataframe not saving after sort


I'm trying to sort a dataframe by a column. I'm writing a function to return the top n amount of results from the totals column.

Here is my function:

def get_most(self, column, amt):
        most = OrderedDict()
        self.data = self.data.sort_values(by=[column])
        for i in range(amt):
            most.update({i : self.data.loc[i, :]})
        return most

When I call the function like so:

most_amt = self.get_most('Total', 3)
    for key, value in most_amt.items():
        print(key, value)

It returns the first 3 rows in the dataframe before the sort. I've also tried using the inplace attribute, like so:

self.data.sort_values(by=[column], inplace=True)

But to no avail.

The app itself is a little tracker I was making for myself to track the spread of the coronavirus. I'm using data from a github repo, and the input is a 312 row csv file. The first three rows are (I added the spaces on this question to make it easier to read, there are no spaces in the actual file):

Hubei,China,2020-03-21T10:13:08,67800,3139,58946
NaN,  Italy,2020-03-21T17:43:03,53578,4825, 6072
NaN,  Spain,2020-03-21T13:13:30,25374,1375, 2125
etc. etc
NaN,  China, 2020-03-23,        81305,3259,71857
NaN,  US,    2020-03-23,        25493, 307,  171

My expected output would then be:

NaN,  China, 2020-03-23,        81305,3259,71857
NaN,  Italy,2020-03-21T17:43:03,53578,4825, 6072
NaN,  US,    2020-03-23,        25493, 307,  171

Instead, it is just the first three rows of the CSV.

Any help would be appreciated.

Thanks!


Solution

  • I think this is the right approach... Realize the command in In[20] will return a new data frame.

    In [17]: data = {'zone':['us', 'italy', 'china', 'south pole'], 
        ...:         'qty': [ 10, 40, 12, 3]}                                       
    
    In [18]: df = pd.DataFrame(data)                                                
    
    In [19]: df                                                                     
    Out[19]: 
             zone  qty
    0          us   10
    1       italy   40
    2       china   12
    3  south pole    3
    
    In [20]: df.sort_values('qty', ascending=False)[:3]    #top 3                   
    Out[20]: 
        zone  qty
    1  italy   40
    2  china   12
    0     us   10