[SOLVED] Create a sequence before an indicator variable

Create a sequence before an indicator variable

I'm looking to do hazard analysis but before I do that I want to clean my dataset so I have only the data from right before a "death", if you will. I'm studying countries and since countries don't "die" per say I need to basically find the point where an event occurs, coded as a '1' in an indicator column, and then generate a column that has 0s everywhere except for every time except for n-periods before my indicator column hits '1'.

For example, if my data were the first row, I would be looking to find a way to generate the second row.

number_of_years = 5
year = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
indicator = c(0, 0, 0, 0, 0, 1, 0, 0, 0, 0)
lag_column = c(0, 1, 1, 1, 1, 1, 0, 0, 0, 0) #I need to make this, the 5 years before the event occurs

Thank you!

Solution

I'm sure there is a better way to do this. Having said that here is what worked for me.

-Sample data

df <- tibble(year = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
             indicator = c(0, 0, 0, 0, 0, 1, 0, 0, 1, 0))

Note I added an extra 1 to the data to check for what happened with overlaps.

index = grep(1, df$indicator)
lag_index <- 0
for (ii in 1:length(index)){
  lag_spots <- seq(from = index[ii]-4, to = index[ii])
  lag_index <- append(lag_index, lag_spots)
} 

lag_index <- unique(lag_index)

lag_column = rep(0, times = nrow(df))
df$lag_column <- lag_column
df$lag_column[lag_index] <- 1

Output

> df
# A tibble: 10 x 3
    year indicator lag_column
   <dbl>     <dbl>      <dbl>
 1     1         0          0
 2     2         0          1
 3     3         0          1
 4     4         0          1
 5     5         0          1
 6     6         1          1
 7     7         0          1
 8     8         0          1
 9     9         1          1
10    10         0          0