I have a dataframe with a column that has numerical values. This column is not well-approximated by a normal distribution. Given another numerical value, not in this column, how can I calculate its percentile in the column? That is, if the value is greater than 80% of the values in the column but less than the other 20%, it would be in the 20th percentile.
Sort the column, and see if the value is in the first 20% or whatever percentile.
for example:
def in_percentile(my_series, val, perc=0.2):
myList=sorted(my_series.values.tolist())
l=len(myList)
return val>myList[int(l*perc)]
Or, if you want the actual percentile simply use searchsorted
:
my_series.values.searchsorted(val)/len(my_series)*100