rggplot2ggcorrplot

Special characters for column labels in correlation matrix


I ran into an issue, running the ggcorr package. My dataframe is quite simple, but column labels contain special characters:

SAPS e' E/e'
11 11,9 5,0
14 6,2 11,1
14 7,4 8,2
15 7,5 6,8
14 11,1 7,8
13 6,6 10,5
14 10,2 6,0
13 7,1 9,1
12 10,0 6,1
15 10,8 4,9

When I run the code as is, the output does not label correctly the variables. I wish to transpose the column names as is. In this example, the figure says e. instead of e' and E.e. instead of E/e'

How should I do it? Thank you in advance,

ggcorr(ex_db, nbreaks = 4,
        label = TRUE,
        label_size = 3,
        method = c("pairwise", "spearman"))

I tried ggcorr(ex_db, aes(x, y=c(SAPS, e', E/e')), method = c("pairwise", "spearman")) + geom_point() without success


Solution

  • This look like a bug. I had a look at the source code of GGally::ggcorr.

    The issue is that your column names are not "syntactically valid variable" names. And unfortunately ggcorr converts the correlation matrix to a dataframe using data.frame() with the default check.names=TRUE.

    As a result the column names are converted to syntactically valid names, thereby replacing the "special" symbols by dots.

    A "hacky" workaround (which may not work in general) would be to manipulate the ggplot object returned by ggcorr and replacing the diagLabel column containing the labels with the original column names.

    Note: This requires to identify the correct geom_text layer which adds the labels stored in the diagLabel column, i.e. the one with mapping: label = ~diagLabel which for your case is the third.

    library(GGally)
    
    p <- ggcorr(ex_db,
      nbreaks = 4,
      label = TRUE,
      label_size = 3,
      method = c("pairwise", "spearman")
    )
    p$layers[[3]]$data[c("diagLabel")] <- names(ex_db)
    
    p
    

    DATA

    ex_db <- structure(list(
      SAPS = c(
        11L, 14L, 14L, 15L, 14L, 13L, 14L, 13L,
        12L, 15L
      ),
      `e'` = c(119, 62, 74, 75, 111, 66, 102, 71, 100, 108),
      `E/e'` = c(50, 111, 82, 68, 78, 105, 60, 91, 61, 49)
    ), class = "data.frame", row.names = c(NA, -10L))