I have a panel dataset looking like this:
head(panel_data)
date symbol close rv rv_plus rv_minus rskew rkurt Mkt.RF SMB HML
1 1999-11-19 a 25.4 19.3 6.76 12.6 -0.791 4.36 -0.11 0.35 -0.5
2 1999-11-22 a 26.8 10.1 6.44 3.69 0.675 5.38 0.02 0.22 -0.92
3 1999-11-23 a 25.2 8.97 2.56 6.41 -1.04 4.00 -1.29 0.08 0.3
4 1999-11-24 a 25.6 5.81 2.86 2.96 -0.505 5.45 0.87 0.08 -0.89
5 1999-11-26 a 25.6 2.78 1.53 1.25 0.617 5.60 0.23 0.92 -0.2
6 1999-11-29 a 26.1 5.07 2.76 2.30 -0.236 7.27 -0.6 0.570 -0.14
where the variable symbol depicts different stocks. I want to calculate the time-series average of the cross-sectional correlation between the variables rskew and rkurt. This means I need to compute the correlation between rskew and rkurt over all different stocks at each point in time and then calculate the time-series average afterwards.
I tried to do it with the rollapply
function from the zoo
package, but since the number of different stocks is not the same for all dates, I cannot simply define width
as an integer. Here is what i tried for a sample width
of 20:
panel_data <- panel_data %>%
group_by(date) %>%
mutate(cor_skew_kurt = rollapply(data = panel_data[7:8],
width=20,
FUN=cor,
align="right",
na.rm=TRUE,
fill=NA)) %>%
ungroup
Is there a way to do this without having to define a fixed width
for each date group?
Or should I maybe use a different approach to do this?
Can you try running the below code? I have recreated an example emulating your issue. if I understood your problem correctly this code should at least put you on the path to the right solution as it solves the issue of unequal time window length.
###################
#Recreating an example dataset with unequal dates across stocks
seed(1)
date6 <- c('1999-11-19','1999-11-22','1999-11-23','1999-11-24','1999-11-26','1999-11-29')
date5 <- c('1999-11-19','1999-11-22','1999-11-23','1999-11-24','1999-11-26')
date4 <- c('1999-11-19','1999-11-22','1999-11-23','1999-11-24')
cor_skew_kurt <- c(rep(NaN,21))
symbol <- c(rep('a',6),rep('b',5),rep('c',4),rep('d',6))
rskew <- rnorm(21,mean=1, sd =1)
rkurt <- rnorm(21, mean=5, sd = 1)
panel_data <- cbind.data.frame(date = c(date6,date5,date4,date6), symbol = symbol, rskew = rskew, rkurt = rkurt, cor_skew_kurt = cor_skew_kurt )
panel_data$date <- as.Date(panel_data$date, '%Y-%m-%d')
# Computing the cor_skew_kurt and filling the table <- ANSWER TO YOUR QUESTION
for (date in unique(panel_data$date))
{
panel_data[panel_data$date == date,"cor_skew_kurt"] <- as.double(cor(panel_data[panel_data$date == date,'rskew'],panel_data[panel_data$date == date,'rkurt']))
}