I'm attempting to label values based on a quartile range of one column in my dataset, but am having trouble synthesizing two steps. Here's a toy dataset below:
fruit rating_store rating_home
apple 1.0 .8
pear .8 .9
berry .9 .4
tomato .7 .5
orange .3 .6
banana .2 .4
... ... ...
First, I'm trying to identify the quartile range of rating_home
which I can do with:
qrating_home = pd.cut(df['rating_home'], 4).value_counts().reset_index()
However, I'm now having trouble assigning labels (e.g., "low", "low_med", "high_med", "high") to the qrating_home
range of values in the dataset. Desired output:
fruit rating_store rating_home rating_home_quartile
apple 1.0 .8 high
pear .8 .9 high
berry .9 .4 low
tomato .7 .5 low
orange .3 .6 low_med
banana .2 .4 low
... ... ...
This post was very helpful but hardcoded the ranges: How to categorize a range of values in Pandas DataFrame Because my dataset may change as more data comes in, I need to calculate the ranges each time that I run my code. Thanks for any help!
I think you want:
df['rating_home_quartile'] = pd.cut(df['rating_home'], bins=4,
labels=['low', 'low_med', 'high_med', 'high'])