rcorrelationconfidence-intervalrobustggcorrplot

pbcor and ggcorrmat correlations give different confidence intervals in R


I am working with multiple variables, where I would like to run a robust correlation and then extract the 95% confidence intervals. I am able to do this using pbcor from the WRS2 package.

However, when I want to plot these values, I use ggcorrmat from the ggstratsplot package. As I was checking the confidence intervals between the two outputs, I noticed they do not match up.

Here is a sample of my dataset:

Individual  varA    varB
1   2.9380842   0.09896456
2   2.9380842   -1.38772037
3   -0.6879859  -2.41310243
4   -0.6879859  0.55722346
5   -2.3129564  -1.34140699
6   -2.3129564  -1.75604301
7   -0.4937431  0.78381085
8   -0.4937431  0.38320385
9   -0.8558126  0.82125672
10  -0.8558126  0.06346062
11  -0.9211026  -1.67170174

Respective code/outputs using this sample dataset:

WRS2::pbcor(data$varA, data$varB, ci=TRUE, nboot=1000, beta=0.1) 
> robust correlation coefficient: 0.275
> test statistic: 0.8582
> p-value:0.41307
> bootstrap CI: [-0.3564; 0.7792]

ggstatsplot::ggcorrmat(data, cor.vars = c(OFT1, PC1), output = "dataframe", matrix.type = "lower", type = "robust", beta = 0.1, sig.level = 0.05, conf.level = 0.95, nboot = 1000)

>robust correlation: 0.275
>test statistic: 0.858
>p-value: 0.413
>CI: [-0.389, 0.751]

Why are the confidence intervals different, but the correlation values are the same?


Solution

  • You are right that the CIs differ between WRS2 and ggstatsplot because ggstatsplot internally doesn't use bootstrapping (which is slower and computationally costly) to compute the CIs.

    Input <- ("
              Individual  varA    varB
    1   2.9380842   0.09896456
    2   2.9380842   -1.38772037
    3   -0.6879859  -2.41310243
    4   -0.6879859  0.55722346
    5   -2.3129564  -1.34140699
    6   -2.3129564  -1.75604301
    7   -0.4937431  0.78381085
    8   -0.4937431  0.38320385
    9   -0.8558126  0.82125672
    10  -0.8558126  0.06346062
    11  -0.9211026  -1.67170174
              ")
    
    # creating a dataframe
    df <- read.table(textConnection(Input), header = TRUE)
    
    set.seed(123)
    WRS2::pbcor(df$varA, df$varB, ci = TRUE, nboot = 1000, beta = 0.1)
    #> Call:
    #> WRS2::pbcor(x = df$varA, y = df$varB, beta = 0.1, ci = TRUE, 
    #>     nboot = 1000)
    #> 
    #> Robust correlation coefficient: 0.275
    #> Test statistic: 0.8582
    #> p-value: 0.41307 
    #> 
    #> Bootstrap CI: [-0.4476; 0.8223]
    
    set.seed(123)
    ggstatsplot::ggcorrmat(
      data = dplyr::select(df, -Individual),
      type = "robust",
      output = "dataframe",
      nboot = 1000, 
      beta = 0.1
    )
    
    #> # A tibble: 1 x 10
    #>   parameter1 parameter2     r ci_low ci_high     t    df     p method       nobs
    #>   <chr>      <chr>      <dbl>  <dbl>   <dbl> <dbl> <dbl> <dbl> <chr>       <int>
    #> 1 varA       varB       0.275 -0.389   0.751 0.809     9 0.439 Percentage~    11
    

    It instead returns non-central confidence intervals for effect sizes whenever it can.

    If you are curious, the relevant piece of code used to compute CIs is here: https://github.com/easystats/correlation/blob/ddd105da55c8b5a81e4ce97b8938f5f00e6e968b/R/cor_to_ci.R#L70-L85