By default Series.values_counts
is sorted by the count, in descending order:
In [192]: pd.Series([3,0,2,0,0,1,0,0,0,1,1,0,1,0,2,2,2,2,2,0,0,2]).value_counts()
Out[192]:
0 10
2 7
1 4
3 1
dtype: int64
If I pass sort=False
, it appears to try and sort by the value key instead:
In [193]: pd.Series([3,0,2,0,0,1,0,0,0,1,1,0,1,0,2,2,2,2,2,0,0,2]).value_counts(sort=False)
Out[193]:
0 10
1 4
2 7
3 1
dtype: int64
However when I increase the length of the series, the sorting reverts to the original order:
In [194]: pd.Series([3,0,2,0,0,1,0,0,0,1,1,0,1,0,2,2,2,2,2,0,0,2]*100).value_counts(sort=False)
Out[194]:
0 1000
2 700
1 400
3 100
dtype: int64
Any ideas what's going on here?
This is correct. You asked .value_counts()
not to sort the result, so it doesn't. Below I emulate what sort=True
actually does, which is simply a sort_values. If you don't sort, then you will get the result of the counts which is done by a hash table and consequently is in an arbitrary order.
In [39]: pd.Series([3,0,2,0,0,1,0,0,0,1,1,0,1,0,2,2,2,2,2,0,0,2]).value_counts(sort=False).sort_values(ascending=False)
Out[39]:
0 10
2 7
1 4
3 1
dtype: int64
In [40]: pd.Series([3,0,2,0,0,1,0,0,0,1,1,0,1,0,2,2,2,2,2,0,0,2]*100).value_counts(sort=False).sort_values(ascending=False)
Out[40]:
0 1000
2 700
1 400
3 100
dtype: int64