I want to filter outliers in the tidyverseframe work in one pipe.
Outlier for this example is just defined as Q1 - 1.5 * IQR
and Q3 + 1.5 * IQR
.
Q1 being the 25 percentile and Q3 the 75% percentile. And IQR the interquartile range, IQR = Q3 - Q1
.
I managed to compute the upper and lower bound for outliers, and I am familiar with the filter()
function from dplyr. However I do not know how to get the values calculated inside the summarize in the same pipe operation back to the complete data.frame
iris %>%
group_by(Species) %>%
# filter(API_Psy_dm <=)
summarise(IQR = IQR(Sepal.Length),
O_upper =quantile(Sepal.Length, probs=c( .75), na.rm = FALSE)+1.5*IQR,
O_lower =quantile(Sepal.Length, probs=c( .25), na.rm = FALSE)-1.5*IQR
)
Is this even possible? Or would I need a second pipe? Or is there a more convenient way than to calculate the upper and lower limit myself?
Use mutate
instead of summarize
, and then filter
:
iris %>%
group_by(Species) %>%
mutate(IQR = IQR(Sepal.Length),
O_upper = quantile(Sepal.Length, probs=c( .75), na.rm = FALSE)+1.5*IQR,
O_lower = quantile(Sepal.Length, probs=c( .25), na.rm = FALSE)-1.5*IQR
) %>%
filter(O_lower <= Sepal.Length & Sepal.Length <= O_upper)