From my reading of the documentation associated with ggpubr::add_summary()
, stats::IQR()
and stats::quantile()
I expected my use of median_iqr
and median_q1q3
to return the same intervals. The documentation for IQR()
says it's using quantile()
... and yet, doesn't return the values I expected. Can someone explain?
sapply(c("reprex", "dplyr", "ggpubr"), require, character=TRUE)
tg <- ToothGrowth
# first two same
tg |>
group_by(dose) |>
summarise(med = median(len),
iqr = median_q1q3(len))
#> # A tibble: 3 × 3
#> dose med iqr$y $ymin $ymax
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 0.5 9.85 9.85 7.22 12.2
#> 2 1 19.2 19.2 16.2 23.4
#> 3 2 26.0 26.0 23.5 27.8
tg |>
group_by(dose) |>
summarise(med = median(len),
iqr = median_hilow_(len, ci=0.5))
#> # A tibble: 3 × 3
#> dose med iqr$y $ymin $ymax
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 0.5 9.85 9.85 7.22 12.2
#> 2 1 19.2 19.2 16.2 23.4
#> 3 2 26.0 26.0 23.5 27.8
# this is how the above are calculated:
tg |>
group_by(dose) |>
summarise(med = median(len),
iqr = median_q1q3(len),
lo = stats::quantile(len, probs = c(.25)),
hi = stats::quantile(len, probs = c(.75)))
#> # A tibble: 3 × 5
#> dose med iqr$y $ymin $ymax lo hi
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 0.5 9.85 9.85 7.22 12.2 7.22 12.2
#> 2 1 19.2 19.2 16.2 23.4 16.2 23.4
#> 3 2 26.0 26.0 23.5 27.8 23.5 27.8
# 3rd one is using this derivation of +/- 1 ci
# this is how 3rd is calculated:
tg |>
group_by(dose) |>
summarise(med = median(len),
iqr = median_iqr(len),
ci = IQR(len, type = 7)) |>
mutate(lo = med - ci,
hi = med + ci)
#> # A tibble: 3 × 6
#> dose med iqr$y $ymin $ymax ci lo hi
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 0.5 9.85 9.85 4.82 14.9 5.03 4.82 14.9
#> 2 1 19.2 19.2 12.1 26.4 7.12 12.1 26.4
#> 3 2 26.0 26.0 21.6 30.2 4.3 21.6 30.2
Created on 2024-04-29 by the reprex package (v2.0.1)
This seems to be intended behavior on the part of the {ggpubr}
package authors.
If you read the documentation for ?median_iqr()
very closely it states:
median_iqr(): returns the median and the error limits defined by the interquartile range.
This is supposed to convey the output will be the (a) median, (b) median less the interquartile range, and (c) median plus the interquartile range.
And in this now closed github issue, one of the package authors declined to change how median_iqr()
works and instead promoted the use of median_q1q3()
which is just a wrapper for median_hilow_(x, 0.5)
.
As you already figured out:
The median along with the 25th and 75th percentile applies to median_q1q3()
and median_hilow_(x, 0.5)
.
library(dplyr)
library(ggpubr)
library(tidyr)
library(purrr)
toothgrowth_grouped_by_dose <- ToothGrowth |> group_by(dose)
wrangle_for_left_join <- function(data, name) {
data |>
transmute(
dose,
!!name := sprintf("%05.2f [%05.2f, %05.2f]", y, ymin, ymax)
)
}
dfs_formula_1 <- list(
df_1 = toothgrowth_grouped_by_dose |>
summarise(
y = median(len),
ymin = quantile(len, probs = 0.25),
ymax = quantile(len, probs = 0.75)
) |>
wrangle_for_left_join("formula_1"),
df_2 = toothgrowth_grouped_by_dose |>
summarise(x = median_q1q3(len)) |>
unnest(cols = c(x)) |>
wrangle_for_left_join("median_q1q3"),
df_3 = toothgrowth_grouped_by_dose |>
summarise(x = median_hilow_(len, 0.5)) |>
unnest(cols = c(x)) |>
wrangle_for_left_join("median_hilow_")
)
reduce(dfs_formula_1, left_join, by = 'dose')
#> # A tibble: 3 × 4
#> dose formula_1 median_q1q3 median_hilow_
#> <dbl> <chr> <chr> <chr>
#> 1 0.5 09.85 [07.22, 12.25] 09.85 [07.22, 12.25] 09.85 [07.22, 12.25]
#> 2 1 19.25 [16.25, 23.38] 19.25 [16.25, 23.38] 19.25 [16.25, 23.38]
#> 3 2 25.95 [23.53, 27.83] 25.95 [23.53, 27.83] 25.95 [23.53, 27.83]
The median minus/plus the interquartile range applies to median_iqr()
.
dfs_formula_2 <- list(
df_4 = toothgrowth_grouped_by_dose |>
summarise(
y = median(len),
iqr = IQR(len),
ymin = y - iqr,
ymax = y + iqr
)|>
wrangle_for_left_join("formula_2"),
df_5 = toothgrowth_grouped_by_dose |>
summarise(x = median_iqr(len)) |>
unnest(cols = c(x)) |>
wrangle_for_left_join("median_iqr")
)
reduce(dfs_formula_2, left_join, by = 'dose')
#> # A tibble: 3 × 3
#> dose formula_2 median_iqr
#> <dbl> <chr> <chr>
#> 1 0.5 09.85 [04.82, 14.88] 09.85 [04.82, 14.88]
#> 2 1 19.25 [12.12, 26.38] 19.25 [12.12, 26.38]
#> 3 2 25.95 [21.65, 30.25] 25.95 [21.65, 30.25]
Created on 2024-04-29 with reprex v2.1.0