I have a dataset that has 2 columns; column A is State_Name and has 5 different options of state, and column B is Total_Spend which has the average total spend of that state per day. There are 365 observations for each state.
What I want to do is count the number of outliers PER STATE using the 1.5 IQR rule and save the count of outliers per state to a new df or table.
So I would expect an output something like:
State | Outlier Count |
---|---|
ATL | 5 |
GA | 20 |
MI | 11 |
NY | 50 |
TX | 23 |
I have managed to get it to work by doing it one state at a time but I can't figure out what to do to achieve this in a single go.
Here is my code at the moment (to return the result for a single state):
daily_agg %>%
select(State_Name, Total_Spend) %>%
filter(State_Name == "NY")
outlier_NY <- length(boxplot.stats(outlier_df$Total_Spend)$out)
Any help would be appreciated.
Thanks!
EDIT WITH TEST DATASET
outlier_mtcars <-
df %>%
select(cyl, disp) %>%
filter(cyl == "6")
outliers <- length(boxplot.stats(outlier_mtcars$disp)$out)
The above shows me 1 outlier for 6 cyl cars but I want a table that shows how many outliers for 4, 6, 8 cyl cars
Since I'm not very familiar with the function boxplot.stats
, I didn't use this in my solution and instead manually calculates 1.5 * IQR + upper quantile.
Here mtcars
was used as an example. For the records that are outliers, they are "flagged" as TRUE
, where we can sum them up in summarize
.
library(dplyr)
mtcars %>%
group_by(cyl) %>%
mutate(flag = disp >= (IQR(disp) * 1.5 + quantile(disp, probs = 0.75)), .keep = "used") %>%
summarize(Outlier = sum(flag))
# A tibble: 3 × 2
cyl Outlier
<dbl> <int>
1 4 0
2 6 1
3 8 0