Say we have the following indicator vector:
library(dplyr)
tibble(row = 1:20,
indicator = rep(c(rep(0, 5), 1, rep(0, 4)), 2))
row indicator
<int> <dbl>
1 1 0
2 2 0
3 3 0
4 4 0
5 5 0
6 6 1
7 7 0
8 8 0
9 9 0
10 10 0
11 11 0
12 12 0
13 13 0
14 14 0
15 15 0
16 16 1
17 17 0
18 18 0
19 19 0
20 20 0
How can I easily create a column that indicates a region around the indicator column. For example, if I want to create three “regions” of size N = 1, 3, and 5, then the desired output should look like:
row indicator region_n1 region_n3 region_n5
<int> <dbl> <dbl> <dbl> <dbl>
1 1 0 0 0 0
2 2 0 0 0 0
3 3 0 0 0 0
4 4 0 0 0 1
5 5 0 0 1 1
6 6 1 1 1 1
7 7 0 0 1 1
8 8 0 0 0 1
9 9 0 0 0 0
10 10 0 0 0 0
11 11 0 0 0 0
12 12 0 0 0 0
13 13 0 0 0 0
14 14 0 0 0 1
15 15 0 0 1 1
16 16 1 1 1 1
17 17 0 0 1 1
18 18 0 0 0 1
19 19 0 0 0 0
20 20 0 0 0 0
I can code this up when there is only one “1” in the indicator variable by sorting, but struggle when there are multiple “1s.” Any help is greatly appreciated, thanks.
Using User-defined function with lag
and lead
:
get_region_n <- function(x,n){
if(n==1){
return(x)
}else{
new_n <- (n-1)/2
new_x <- x
for(i in new_n:1){
new_x <- new_x+lag(x,n=i,default=0)+lead(x,n=i,default=0)
}
return(new_x)
}
}
df%>%mutate(region_n1=get_region_n(indicator,1),
region_n3=get_region_n(indicator,3),
region_n5=get_region_n(indicator,5))
row indicator region_n1 region_n3 region_n5
<int> <dbl> <dbl> <dbl> <dbl>
1 1 0 0 0 0
2 2 0 0 0 0
3 3 0 0 0 0
4 4 0 0 0 1
5 5 0 0 1 1
6 6 1 1 1 1
7 7 0 0 1 1
8 8 0 0 0 1
9 9 0 0 0 0
10 10 0 0 0 0
11 11 0 0 0 0
12 12 0 0 0 0
13 13 0 0 0 0
14 14 0 0 0 1
15 15 0 0 1 1
16 16 1 1 1 1
17 17 0 0 1 1
18 18 0 0 0 1
19 19 0 0 0 0
20 20 0 0 0 0