rvectorregion

Determine regions around elements in a vector in R


Say we have the following indicator vector:

library(dplyr)
tibble(row = 1:20,
       indicator = rep(c(rep(0, 5), 1, rep(0, 4)), 2))

     row indicator
   <int>     <dbl>
 1     1         0
 2     2         0
 3     3         0
 4     4         0
 5     5         0
 6     6         1
 7     7         0
 8     8         0
 9     9         0
10    10         0
11    11         0
12    12         0
13    13         0
14    14         0
15    15         0
16    16         1
17    17         0
18    18         0
19    19         0
20    20         0


How can I easily create a column that indicates a region around the indicator column. For example, if I want to create three “regions” of size N = 1, 3, and 5, then the desired output should look like:

     row indicator region_n1 region_n3 region_n5
   <int>     <dbl>     <dbl>     <dbl>     <dbl>
 1     1         0         0         0         0
 2     2         0         0         0         0
 3     3         0         0         0         0
 4     4         0         0         0         1
 5     5         0         0         1         1
 6     6         1         1         1         1
 7     7         0         0         1         1
 8     8         0         0         0         1
 9     9         0         0         0         0
10    10         0         0         0         0
11    11         0         0         0         0
12    12         0         0         0         0
13    13         0         0         0         0
14    14         0         0         0         1
15    15         0         0         1         1
16    16         1         1         1         1
17    17         0         0         1         1
18    18         0         0         0         1
19    19         0         0         0         0
20    20         0         0         0         0

I can code this up when there is only one “1” in the indicator variable by sorting, but struggle when there are multiple “1s.” Any help is greatly appreciated, thanks.


Solution

  • Using User-defined function with lag and lead:

    get_region_n <- function(x,n){
      if(n==1){
        return(x)
      }else{
        new_n <- (n-1)/2
        new_x <- x
        for(i in new_n:1){
          new_x <- new_x+lag(x,n=i,default=0)+lead(x,n=i,default=0)
        }
        return(new_x)
      }
    }
    
    df%>%mutate(region_n1=get_region_n(indicator,1),
                region_n3=get_region_n(indicator,3),
                region_n5=get_region_n(indicator,5))
    
         row indicator region_n1 region_n3 region_n5
       <int>     <dbl>     <dbl>     <dbl>     <dbl>
     1     1         0         0         0         0
     2     2         0         0         0         0
     3     3         0         0         0         0
     4     4         0         0         0         1
     5     5         0         0         1         1
     6     6         1         1         1         1
     7     7         0         0         1         1
     8     8         0         0         0         1
     9     9         0         0         0         0
    10    10         0         0         0         0
    11    11         0         0         0         0
    12    12         0         0         0         0
    13    13         0         0         0         0
    14    14         0         0         0         1
    15    15         0         0         1         1
    16    16         1         1         1         1
    17    17         0         0         1         1
    18    18         0         0         0         1
    19    19         0         0         0         0
    20    20         0         0         0         0