I am working with multiple variables, where I would like to run a robust correlation and then extract the 95% confidence intervals. I am able to do this using pbcor
from the WRS2
package.
However, when I want to plot these values, I use ggcorrmat
from the ggstratsplot
package. As I was checking the confidence intervals between the two outputs, I noticed they do not match up.
Here is a sample of my dataset:
Individual varA varB
1 2.9380842 0.09896456
2 2.9380842 -1.38772037
3 -0.6879859 -2.41310243
4 -0.6879859 0.55722346
5 -2.3129564 -1.34140699
6 -2.3129564 -1.75604301
7 -0.4937431 0.78381085
8 -0.4937431 0.38320385
9 -0.8558126 0.82125672
10 -0.8558126 0.06346062
11 -0.9211026 -1.67170174
Respective code/outputs using this sample dataset:
WRS2::pbcor(data$varA, data$varB, ci=TRUE, nboot=1000, beta=0.1)
> robust correlation coefficient: 0.275
> test statistic: 0.8582
> p-value:0.41307
> bootstrap CI: [-0.3564; 0.7792]
ggstatsplot::ggcorrmat(data, cor.vars = c(OFT1, PC1), output = "dataframe", matrix.type = "lower", type = "robust", beta = 0.1, sig.level = 0.05, conf.level = 0.95, nboot = 1000)
>robust correlation: 0.275
>test statistic: 0.858
>p-value: 0.413
>CI: [-0.389, 0.751]
Why are the confidence intervals different, but the correlation values are the same?
You are right that the CIs differ between WRS2
and ggstatsplot
because ggstatsplot
internally doesn't use bootstrapping (which is slower and computationally costly) to compute the CIs.
Input <- ("
Individual varA varB
1 2.9380842 0.09896456
2 2.9380842 -1.38772037
3 -0.6879859 -2.41310243
4 -0.6879859 0.55722346
5 -2.3129564 -1.34140699
6 -2.3129564 -1.75604301
7 -0.4937431 0.78381085
8 -0.4937431 0.38320385
9 -0.8558126 0.82125672
10 -0.8558126 0.06346062
11 -0.9211026 -1.67170174
")
# creating a dataframe
df <- read.table(textConnection(Input), header = TRUE)
set.seed(123)
WRS2::pbcor(df$varA, df$varB, ci = TRUE, nboot = 1000, beta = 0.1)
#> Call:
#> WRS2::pbcor(x = df$varA, y = df$varB, beta = 0.1, ci = TRUE,
#> nboot = 1000)
#>
#> Robust correlation coefficient: 0.275
#> Test statistic: 0.8582
#> p-value: 0.41307
#>
#> Bootstrap CI: [-0.4476; 0.8223]
set.seed(123)
ggstatsplot::ggcorrmat(
data = dplyr::select(df, -Individual),
type = "robust",
output = "dataframe",
nboot = 1000,
beta = 0.1
)
#> # A tibble: 1 x 10
#> parameter1 parameter2 r ci_low ci_high t df p method nobs
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <int>
#> 1 varA varB 0.275 -0.389 0.751 0.809 9 0.439 Percentage~ 11
It instead returns non-central confidence intervals for effect sizes whenever it can.
If you are curious, the relevant piece of code used to compute CIs is here: https://github.com/easystats/correlation/blob/ddd105da55c8b5a81e4ce97b8938f5f00e6e968b/R/cor_to_ci.R#L70-L85