I have a time-series panel dataset which is structured in the following way:
df <- data.frame(
year = c(2012L, 2013L, 2014L, 2012L, 2013L, 2014L, 2015L),
id = c(1L, 1L, 1L, 2L, 2L, 2L, 2L),
col1 = c(11L, 13L, 13L, 16L, 15L, 15L, 16L),
col2 = c(10L, 14L, 12L, 13L, 11L, 16L, 17L),
col3 = c(17L, 12L, 12L, 14L, 19L, 21L, 12L),
)
> df
year id col1 col2 col3
1 2012 1 11 10 17
2 2013 1 13 14 12
3 2014 1 13 12 12
4 2012 2 16 13 14
5 2013 2 15 11 19
6 2014 2 15 16 21
7 2015 2 16 17 12
>
I would like to generate a cross-sectional lower triangle correlation latex table across each column pair and across all groups but I want the final table to be the average across all groups and also include the p stat. This is what I have so far using dplyr:
library(dplyr)
df %>%
group_by(id) %>%
summarize(COR=cor(col1,col2))
But I would like to have this for all column pairs and in my actual dataset, I have many more ids. I would like to use xtable, stargazer, or Hmisc to generate a latex correlation table that has the average corr across groups as the output and also includes the p-value. I would like my final output to look like something like this: imgur.com/a/7Jwmm8f
An option would be to split
by 'id' column, then apply the cor
on the 'col' columns, get the elementwise +
and divide by the length
of unique
'id' and replace the upper.tri
values to NA
out <- Reduce(`+`, lapply(split(df[3:5], df$id),
function(x) cor(x, use = "complete.obs")))/length(unique(df$id))
out[upper.tri(out)] <- NA
-output
out
# col1 col2 col3
#col1 1.0000000 NA NA
#col2 0.5902554 1.000000 NA
#col3 -0.9807620 -0.569806 1
Or using tidyverse
library(dplyr)
library(purrr)
library(magrittr)
df %>%
select(-year) %>%
group_split(id, .keep = FALSE) %>%
map(cor, use = "complete.obs") %>%
reduce(`+`) %>%
divide_by(n_distinct(df$id)) %>%
replace(., upper.tri(.), NA)
# col1 col2 col3
#col1 1.0000000 NA NA
#col2 0.5902554 1.000000 NA
#col3 -0.9807620 -0.569806 1