rdata.tablelocf

R carry forward last observation n times by group


This one is driving me nuts. I have a large data.table with monthly stock data. Every June I assign every stock to one of 10 portfolios based on an accounting variable. I would like to carry forward the assigned portfolio variable to the next 11 month until each stock gets assigned to a new portfolio 1 to 10 in June next year. na.locf is basically what I'm looking for but I am running into 2 issues:

  1. Some stocks lack sufficient accounting data the next year, so they shouldn't be assigned to a portfolio in that year (i.e. portfolio variable should stay NA). But of course na.locf keeps carrying forward the portfolio number until there is a new one.
  2. Some stocks may get delisted after e.g. 3 months so they don't have another 11 month of data.

That's why I looking for a code that carries forward the last observation a maxium of 11 times until June next year (when there is a new portfolio number).

That's the na.locf solution right now with the 2 issues (PERMNO is the stock identifier):

COMPUSTAT_CRSP_IBES1[,
                     Portfolio_Monthly := na.locf(Portfolio_Monthly, 
                                                  na.rm = FALSE),
                     by = PERMNO]

I tried to use rep but that didn't work at all:

COMPUSTAT_CRSP_IBES1[,
                     Portfolio_Monthly := if_else(!is.na(Portfolio_Monthly), 
                                                  rep(Portfolio_Monthly, 11), 
                                                  NA), 
                     by = PERMNO]

Thank's for any hints!


Solution

  • You can create and/or use your fiscal year (June - May) as one of the group by criteria in your na.locf solution

    #show data before calculations
    data.frame(dat)
    
    #demo FY calculation
    dat[, FY := year(MONTH) + as.numeric(month(MONTH) >= 6)]
    
    #actual code
    dat[, Portfolio_Monthly := zoo::na.locf(Portfolio_Monthly, na.rm=FALSE),
        by=list(PERMNO, year(MONTH) + as.numeric(month(MONTH) >= 6))]
    
    #show results
    data.frame(dat)
    

    sample data:

    library(data.table)
    set.seed(0L)
    dat <- data.table(PERMNO=rep(LETTERS[1:12], each=20), 
        MONTH=rep(seq(as.Date("2000-01-01"), by="1 month", length.out=20), 12),
        Portfolio_Monthly=NA_real_)
    for (i in sample(1:dat[,.N], 5)) {
        set(dat, i, 3L, rnorm(1))   
    }
    setorder(dat, PERMNO, MONTH)