I am working on a dataset with a single regressor, and d
many response variables. While working with that, I have encountered a dataframe of correlations, with 1 + dC2 = 1 + d(d-1)/2
which exactly looks like the ones that can be found here. I'm inserting an example of such a dataframe:
X r_{1,2} r_{1,3} r_{1,4} r_{1,5} r_{2,3} r_{2,4} r_{.,.}
1 21 0.6002993 0.923644 0.8184414 0.3721132 0.9337539 0.6569090 .........
2 22 0.6498641 0.358339 0.9511748 0.1091543 0.6651190 0.9960394 .........
3 23 0.6825716 0.117533 0.8900186 0.9256916 0.9253819 0.6588873 .........
4 24 0.8280786 0.324110 0.6634117 0.7292685 0.0207334 0.9122315 .........
5 25 0.9520840 0.642721 0.5012283 0.2722650 0.2582217 0.3901019 .........
6 26 0.8714017 0.157062 0.8675581 0.5384571 0.6053657 0.5930488 .........
. . .
. . .
. . .
The dataframe has the first column to be the time variable X
, and any other column with title r_{i,j}
contains the time-varying correlations between the i
-th and j
-th variables.
Now, I am trying to make a matrix of plots of time-varying correlations for every pair of variables, using ggplot2
or similar packages. This is probably not very difficult to do manually, but since I have d
many variables, the process has to be automated. Essentially, my desired plot looks like the following:
Note that:
X
in the dataframe, with axis texts in the far bottom.Any help will be appreciated. Please let me know if any clarification is neede. Thanks!
This is a bit different from a standard pairs plot, such as you might produce with GGally::ggpairs
, since the x axis is constant across all plots.
If you want to build this yourself then you could automate using the patchwork package.
First, work out the row and column numbers that are going to appear in your plot, and what their names would be if they were columns in your data frame, even for the blank spaces in your plot that do not have a corresponding column in your data frame:
library(ggplot2)
library(patchwork)
n <- which(choose(seq(20), 2) == ncol(df) - 1)
pairs <- apply(expand.grid(seq(n-1) + 1, seq(n - 1)), 1, function(x) {
paste0("r_{", x[2], ",", x[1], "}") })
pairs <- c(t(matrix(pairs, nrow = n - 1)))
pairs
#> [1] "r_{1,2}" "r_{2,2}" "r_{3,2}" "r_{4,2}" "r_{1,3}" "r_{2,3}" "r_{3,3}"
#> [8] "r_{4,3}" "r_{1,4}" "r_{2,4}" "r_{3,4}" "r_{4,4}" "r_{1,5}" "r_{2,5}"
#> [15] "r_{3,5}" "r_{4,5}"
Now we can create a plot for member of pairs
. If it does not exist as a column in the data frame, we create an empty plot. If it is on the top row or the rightmost column, we add a single facet. If it is on the bottom or right, we use axis text, and remove it otherwise:
plots <- lapply(pairs, function(i) {
is_bottom <- grepl(paste0(",", n), i)
is_right <- grepl(paste0(n - 1, ","), i)
is_left <- grepl("1,", i)
is_top <- grepl(",2", i)
is_col <- i %in% names(df)
rowname <- sub("^r_\\{(\\d+),.*$", "\\1", i)
colname <- sub("^.*,(\\d+).*$", "\\1", i)
if(!is_col) {
p <- ggplot(df, aes(X, runif(length(x)))) + geom_blank()
} else {
p <- ggplot(df, aes(X, .data[[i]])) +
geom_line(linewidth = 1.5, color = "gray50", lineend = "round")
}
p <- p + scale_y_continuous(NULL, limits = c(0, 1)) +
scale_x_continuous(NULL, 1:10) +
theme_classic()
f <- quote(.~.)
if(is_top) f[3] <- rowname
if(is_right) f[2] <- colname
if(is_top | is_right) p <- p + facet_grid(f)
p + theme(axis.text.x = if(is_bottom) element_text() else element_blank(),
axis.title.x = if(is_bottom) element_text() else element_blank(),
axis.text.y = if(is_left) element_text() else element_blank(),
panel.border = if(is_col) element_rect(fill = NA) else element_blank(),
axis.ticks = if(is_col) element_line() else element_blank(),
axis.line = element_blank())
})
To plot it, we can do:
Reduce(`+`, plots) +
plot_annotation(caption = "X", theme = theme(
plot.caption = element_text(hjust = 0.5, size = 12)))
EDIT
To plot a facet with the column name above each panel, change the creation of plots
in the above code to:
plots <- lapply(pairs, function(i) {
is_bottom <- grepl(paste0(",", n), i)
is_right <- grepl(paste0(n - 1, ","), i)
is_left <- grepl("1,", i)
is_top <- grepl(",2", i)
is_col <- i %in% names(df)
if(!is_col) {
p <- ggplot(df, aes(X, runif(length(X)))) + geom_blank()
} else {
p <- ggplot(df, aes(X, .data[[i]])) +
geom_line(linewidth = 1.5, color = "gray50", lineend = "round")
}
p <- p + scale_y_continuous(NULL, limits = c(0, 1)) +
scale_x_continuous(NULL, 1:10) +
theme_classic()
f <- quote(.~.)
f[3] <- i
if(is_col) p <- p + facet_grid(f)
p + theme(axis.text.x = if(is_bottom) element_text() else element_blank(),
axis.title.x = if(is_bottom) element_text() else element_blank(),
axis.text.y = if(is_left) element_text() else element_blank(),
panel.border = if(is_col) element_rect(fill = NA) else element_blank(),
axis.ticks = if(is_col) element_line() else element_blank(),
axis.line = element_blank())
})
And when plotted, this produces
Data used
set.seed(1)
df <- cbind(X = 1:10,
replicate(10, sort(runif(10))) |>
as.data.frame() |>
setNames(combn(1:5, 2, FUN = \(x) paste0("r_{", x[1], ",", x[2], "}"))))
df
#> X r_{1,2} r_{1,3} r_{1,4} r_{1,5} r_{2,3} r_{2,4} r_{2,5}
#> 1 1 0.06178627 0.1765568 0.01339033 0.1079436 0.0233312 0.07067905 0.08424691
#> 2 2 0.20168193 0.2059746 0.12555510 0.1862176 0.4772301 0.09946616 0.25801678
#> 3 3 0.26550866 0.3800352 0.21214252 0.4112744 0.5297196 0.24479728 0.29360337
#> 4 4 0.37212390 0.3841037 0.26722067 0.4820801 0.5530363 0.31627171 0.33239467
#> 5 5 0.57285336 0.4976992 0.34034900 0.4935413 0.6470602 0.40683019 0.45906573
#> 6 6 0.62911404 0.6870228 0.38238796 0.5995658 0.6927316 0.43809711 0.47854525
#> 7 7 0.66079779 0.7176185 0.38611409 0.6684667 0.7323137 0.47761962 0.65087047
#> 8 8 0.89838968 0.7698414 0.65167377 0.7237109 0.7829328 0.51863426 0.76631067
#> 9 9 0.90820779 0.7774452 0.86969085 0.7942399 0.7893562 0.66200508 0.87532133
#> 10 10 0.94467527 0.9919061 0.93470523 0.8273733 0.8209463 0.86120948 0.91287592
#> r_{3,4} r_{3,5} r_{4,5}
#> 1 0.3337749 0.1216919 0.05893438
#> 2 0.3390729 0.1433044 0.23962942
#> 3 0.3466835 0.2026923 0.41008408
#> 4 0.3899895 0.2454885 0.45527445
#> 5 0.4763512 0.3253522 0.60493329
#> 6 0.7773207 0.3999944 0.64228826
#> 7 0.8394404 0.4346595 0.77891468
#> 8 0.8643395 0.7111212 0.79730883
#> 9 0.8921983 0.7125147 0.81087024
#> 10 0.9606180 0.7570871 0.87626921
Created on 2023-07-30 with reprex v2.0.2