pythonpandasquartile

Assign label in new column based on quartile range of values


I'm attempting to label values based on a quartile range of one column in my dataset, but am having trouble synthesizing two steps. Here's a toy dataset below:

fruit   rating_store   rating_home    

apple   1.0            .8
pear    .8             .9
berry   .9             .4
tomato  .7             .5
orange  .3             .6
banana  .2             .4
...     ...            ...

First, I'm trying to identify the quartile range of rating_home which I can do with:

qrating_home = pd.cut(df['rating_home'], 4).value_counts().reset_index()

However, I'm now having trouble assigning labels (e.g., "low", "low_med", "high_med", "high") to the qrating_home range of values in the dataset. Desired output:

fruit   rating_store   rating_home   rating_home_quartile 

apple   1.0            .8            high
pear    .8             .9            high
berry   .9             .4            low
tomato  .7             .5            low
orange  .3             .6            low_med
banana  .2             .4            low
...     ...            ...

This post was very helpful but hardcoded the ranges: How to categorize a range of values in Pandas DataFrame Because my dataset may change as more data comes in, I need to calculate the ranges each time that I run my code. Thanks for any help!


Solution

  • I think you want:

    df['rating_home_quartile'] = pd.cut(df['rating_home'], bins=4, 
                                         labels=['low', 'low_med', 'high_med', 'high'])