rdata-sciencehazard

Create a sequence before an indicator variable


I'm looking to do hazard analysis but before I do that I want to clean my dataset so I have only the data from right before a "death", if you will. I'm studying countries and since countries don't "die" per say I need to basically find the point where an event occurs, coded as a '1' in an indicator column, and then generate a column that has 0s everywhere except for every time except for n-periods before my indicator column hits '1'.

For example, if my data were the first row, I would be looking to find a way to generate the second row.

number_of_years = 5
year = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
indicator = c(0, 0, 0, 0, 0, 1, 0, 0, 0, 0)
lag_column = c(0, 1, 1, 1, 1, 1, 0, 0, 0, 0) #I need to make this, the 5 years before the event occurs

Thank you!


Solution

  • I'm sure there is a better way to do this. Having said that here is what worked for me.

    -Sample data

    df <- tibble(year = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
                 indicator = c(0, 0, 0, 0, 0, 1, 0, 0, 1, 0))
    
    index = grep(1, df$indicator)
    lag_index <- 0
    for (ii in 1:length(index)){
      lag_spots <- seq(from = index[ii]-4, to = index[ii])
      lag_index <- append(lag_index, lag_spots)
    } 
    
    lag_index <- unique(lag_index)
    
    lag_column = rep(0, times = nrow(df))
    df$lag_column <- lag_column
    df$lag_column[lag_index] <- 1  
    

    Output

    > df
    # A tibble: 10 x 3
        year indicator lag_column
       <dbl>     <dbl>      <dbl>
     1     1         0          0
     2     2         0          1
     3     3         0          1
     4     4         0          1
     5     5         0          1
     6     6         1          1
     7     7         0          1
     8     8         0          1
     9     9         1          1
    10    10         0          0