I'm trying to sort a dataframe by a column. I'm writing a function to return the top n
amount of results from the totals
column.
Here is my function:
def get_most(self, column, amt):
most = OrderedDict()
self.data = self.data.sort_values(by=[column])
for i in range(amt):
most.update({i : self.data.loc[i, :]})
return most
When I call the function like so:
most_amt = self.get_most('Total', 3)
for key, value in most_amt.items():
print(key, value)
It returns the first 3 rows in the dataframe before the sort. I've also tried using the inplace
attribute, like so:
self.data.sort_values(by=[column], inplace=True)
But to no avail.
The app itself is a little tracker I was making for myself to track the spread of the coronavirus. I'm using data from a github repo, and the input is a 312 row csv file. The first three rows are (I added the spaces on this question to make it easier to read, there are no spaces in the actual file):
Hubei,China,2020-03-21T10:13:08,67800,3139,58946
NaN, Italy,2020-03-21T17:43:03,53578,4825, 6072
NaN, Spain,2020-03-21T13:13:30,25374,1375, 2125
etc. etc
NaN, China, 2020-03-23, 81305,3259,71857
NaN, US, 2020-03-23, 25493, 307, 171
My expected output would then be:
NaN, China, 2020-03-23, 81305,3259,71857
NaN, Italy,2020-03-21T17:43:03,53578,4825, 6072
NaN, US, 2020-03-23, 25493, 307, 171
Instead, it is just the first three rows of the CSV.
Any help would be appreciated.
Thanks!
I think this is the right approach... Realize the command in In[20] will return a new data frame.
In [17]: data = {'zone':['us', 'italy', 'china', 'south pole'],
...: 'qty': [ 10, 40, 12, 3]}
In [18]: df = pd.DataFrame(data)
In [19]: df
Out[19]:
zone qty
0 us 10
1 italy 40
2 china 12
3 south pole 3
In [20]: df.sort_values('qty', ascending=False)[:3] #top 3
Out[20]:
zone qty
1 italy 40
2 china 12
0 us 10