I am plotting a dataframe, df, containing x and y in a scatter plot. Clearly, in many cases, for each x value, y-values may be scattered. I want to remove y outliers for each x. This is different from bulk outlier removal using IQRs.
Can anyone assist with the same?
I was not able to find any ready-made code for this. There are codes that remove outliers in bulk, not selectively for each x.
Group DataFrame
by the x
column and then apply a function to remove outliers from each group:
def remove_outliers(group, column='y'):
Q1 = group[column].quantile(0.25)
Q3 = group[column].quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
return group[(group[column] >= lower_bound) & (group[column] <= upper_bound)]
df = df.groupby('x').apply(remove_outliers).reset_index(drop=True)