Differences between manually and ggplot-calculated boxplot statistics / outlier elimination

Note: Question edited to simplify the data.

I have a few datasets combined in a dataframe, which I want to eliminate outliers from. When trying different ways to calculate upper and lower thresholds I came upon discrepacies between the results of ggplot-boxplots and manual calculation. I would like to (1) understand the discrepancies and (2) find a convenient way to eliminate outliers from multiple similar datasets via dplyr.

Example data (taken from the boxplot.stats help page):

library(tidyverse)
require(stats)
x <- c(1:100, 1000)
(b1 <- boxplot.stats(x))

This returns ymin = 1 and max = 100 and 1000 as an obvious outlier.

My initial approach can be reproduced with this data, too... to have it complete, here is my original dataset too (however discussion is based on the simpler one).

# small dataset
values_A30 <- c(0.2079762,0.2605029,0.3054334,0.304067,0.8696487,0.3470931,0.2560001,0.3096838,0.2887556,0.3741472,0.2178375,0.2234682,0.2628923,0.2745458,0.5438208,0.7068278,0.6492924,1.100175,0.2740491,0.2849299,0.7562737,0.3009749,0.2598575,0.3460925,0.3265929,0.4208336,0.4353992,1.132036,0.3856708,0.1978752,0.3676808,0.4196799,0.4486595,0.3282394,0.3725664,0.385373,0.3680049,0.7875058,0.8098903,0.741165,1.260887,0.3521471,0.3883195,1.17124,0.3225514,0.3492051)

bpdata <- data.frame(Var1 = "A", SSW = 30, Value = values_A30)

Usually I start up with ggplot boxplots to get a visual impression.

# test plot
ggplot(as.data.frame(x),
       aes(x = NA, y = x)) +
  geom_boxplot()

#extract stats
gg_stats <- ggplot_build(
  ggplot(as.data.frame(x),
         aes(x = NA, y = x)) +
    geom_boxplot()
)$data[[1]] |>
  select(1:9)

# manual stats
man_stats <- as.data.frame(x) |>
  summarise(
    Qu1 = quantile(x, 0.25),
    Qu3 = quantile(x, 0.75),
    IQR = IQR(x)
  ) |>
  mutate(ymin = Qu1 - (1.5 * IQR),
         ymax = Qu3 + (1.5 * IQR))

Both, boxplot.stats and ggplot_build, lead to the same results. The manual calculation, however, leads to different results, ymin = -49 and ymax = 151. The values for the quartiles and IQR are correct and the rest of the calculation seems not to be too complicated, so: What am I doing wrong?

Insights / Answer

I worked more on the problem described and found the answer with the hints below and also on [this page] (Box-plot R calculating outliers).

To sum it up:

The initially calculated limits (Q1 - 1.5 IQR and Q3 + 1.5 IQR) are before outlier elimination, while the ymin/ymax output of the boxplot-stats (ggplot or R core boxplot function) are based on the actual data points and after outlier elimination. That was what confused me.
There is further complexity possible based on the underlying algorithm (type =) in the quantile and IQR functions.

Thanks again for help @Michiel Duvekot

Solution

It's perhaps easier to see what's happening if you use a very simple range of values

Value <- 0:100
Qu1  <- Value  |> quantile(0.25)
Qu3  <- Value  |>  quantile(0.75)
IQR <- Value  |>  IQR()

min <- Qu1 - (1.5 * IQR)
max <- Qu3 + (1.5 * IQR)

cat(min, max)

gives

-50 150

if you want to know which values are considered outliers, you can use

(bpdata %>% filter(Var1 == "A", SSW == 30) %>% pull(Value)  %>% boxplot.stats())$out

If you want to calculate the min and max yourself, you can do what boxplt.stats does:

y <- bpdata %>% filter(Var1 == "A", SSW == 30) %>% pull(Value)

stats <- stats::fivenum(y, na.rm = TRUE)
iqr <- diff(stats[c(2, 4)])
coef <- 1.5
min <- (stats[2] - coef * iqr)
max <- (stats[4] + coef * iqr)