rtime-seriescorrelation

Time-series average of cross-sectional correlations


I have a panel dataset looking like this:

head(panel_data)

        date      symbol   close  rv   rv_plus rv_minus rskew rkurt  Mkt.RF  SMB   HML
    1 1999-11-19    a      25.4  19.3    6.76    12.6  -0.791  4.36  -0.11  0.35  -0.5 
    2 1999-11-22    a      26.8  10.1    6.44    3.69   0.675  5.38   0.02  0.22  -0.92
    3 1999-11-23    a      25.2  8.97    2.56    6.41  -1.04   4.00  -1.29  0.08   0.3 
    4 1999-11-24    a      25.6  5.81    2.86    2.96  -0.505  5.45   0.87  0.08  -0.89
    5 1999-11-26    a      25.6  2.78    1.53    1.25   0.617  5.60   0.23  0.92  -0.2 
    6 1999-11-29    a      26.1  5.07    2.76    2.30  -0.236  7.27  -0.6   0.570 -0.14

where the variable symbol depicts different stocks. I want to calculate the time-series average of the cross-sectional correlation between the variables rskew and rkurt. This means I need to compute the correlation between rskew and rkurt over all different stocks at each point in time and then calculate the time-series average afterwards.

I tried to do it with the rollapply function from the zoo package, but since the number of different stocks is not the same for all dates, I cannot simply define width as an integer. Here is what i tried for a sample width of 20:

panel_data <- panel_data %>%
  group_by(date) %>%
  mutate(cor_skew_kurt = rollapply(data = panel_data[7:8],
                         width=20,
                         FUN=cor,
                         align="right",
                         na.rm=TRUE,
                         fill=NA)) %>%
  ungroup

Is there a way to do this without having to define a fixed width for each date group?

Or should I maybe use a different approach to do this?


Solution

  • Can you try running the below code? I have recreated an example emulating your issue. if I understood your problem correctly this code should at least put you on the path to the right solution as it solves the issue of unequal time window length.

    ###################
    #Recreating an example dataset with unequal dates across stocks
    seed(1)
    date6 <- c('1999-11-19','1999-11-22','1999-11-23','1999-11-24','1999-11-26','1999-11-29')
    date5 <- c('1999-11-19','1999-11-22','1999-11-23','1999-11-24','1999-11-26')
    date4 <- c('1999-11-19','1999-11-22','1999-11-23','1999-11-24')
    cor_skew_kurt <- c(rep(NaN,21))
    symbol <- c(rep('a',6),rep('b',5),rep('c',4),rep('d',6))
    rskew <- rnorm(21,mean=1, sd =1)
    rkurt <- rnorm(21, mean=5, sd = 1)
    panel_data <- cbind.data.frame(date = c(date6,date5,date4,date6), symbol = symbol, rskew = rskew, rkurt = rkurt, cor_skew_kurt = cor_skew_kurt )
    panel_data$date <- as.Date(panel_data$date, '%Y-%m-%d')
    
    
    # Computing the cor_skew_kurt and filling the table  <- ANSWER TO YOUR QUESTION
    
    for (date in unique(panel_data$date))
    {
       panel_data[panel_data$date == date,"cor_skew_kurt"] <-  as.double(cor(panel_data[panel_data$date == date,'rskew'],panel_data[panel_data$date == date,'rkurt']))
      
    }