rdplyrsubsetpanel-data

Subsetting Panel Data conditional on consecutive strings of length


I'm stuck trying to subset some panel data, i.e. ids within group, using dplyr.

I want to exact all ids, within each group, grp that has a NUM series with a minimum smaller than 2 and a maximum greater than 2. I've constructed a minimal working example below that should illustrate the issue.

I have been working with filter(), row_number() == c(1,n()), and tried to separate it out and merge, i.e. different types of _join, it back together. What can I try next?

What I got

A tibble like this,

df <- tibble(id = rep(0:1, c(8, 13)), grp = rep(c("01", "02"), c(13, 8)),
             NUM = c(-4, -3, -2, -1, 1, 2, 3, 4, -3, -2, -1,
                      1, 2, -3, -2, -1, 1, 2, 3, 4, 5)) %>% group_by(id, grp)
df %>% print(n=21)
#> # A tibble: 21 x 3
#> # Groups:   id, grp [3]
#>       id   grp   NUM
#>    <int> <chr> <dbl>
#>  1     0    01    -4
#>  2     0    01    -3
#>  3     0    01    -2
#>  4     0    01    -1
#>  5     0    01     1
#>  6     0    01     2
#>  7     0    01     3
#>  8     0    01     4
#>  9     1    01    -3
#> 10     1    01    -2
#> 11     1    01    -1
#> 12     1    01     1
#> 13     1    01     2
#> 14     1    02    -3
#> 15     1    02    -2
#> 16     1    02    -1
#> 17     1    02     1
#> 18     1    02     2
#> 19     1    02     3
#> 20     1    02     4
#> 21     1    02     5

What I am trying to get / desired outcome

df_out <- tibble(id = rep(0:1, c(9, 8)),
             grp = rep(c("01", "02"), c(9, 8)),
             NUM = c(-4, -3, -2, -1, 1, 2, 3,
           4, 5, -3, -2, -1, 1, 2, 3, 4, 5)) %>%  group_by(id, grp)
df_out
#> # A tibble: 17 x 3
#> # Groups:   id, grp [3]
#>       id   grp   NUM
#>    <int> <chr> <dbl>
#>  1     0    01    -4
#>  2     0    01    -3
#>  3     0    01    -2
#>  4     0    01    -1
#>  5     0    01     1
#>  6     0    01     2
#>  7     0    01     3
#>  8     0    01     4
#>  9     1    02    -3
#> 10     1    02    -2
#> 11     1    02    -1
#> 12     1    02     1
#> 13     1    02     2
#> 14     1    02     3
#> 15     1    02     4
#> 16     1    02     5

Solution

  • Like so?

    library(dplyr)
    filter(df, any(NUM > 2) & any(NUM < -2))
    
    # A tibble: 16 x 3
    # Groups:   id, grp [2]
          id grp     NUM
       <int> <chr> <dbl>
     1     0 01    -4.00
     2     0 01    -3.00
     3     0 01    -2.00
     4     0 01    -1.00
     5     0 01     1.00
     6     0 01     2.00
     7     0 01     3.00
     8     0 01     4.00
     9     1 02    -3.00
    10     1 02    -2.00
    11     1 02    -1.00
    12     1 02     1.00
    13     1 02     2.00
    14     1 02     3.00
    15     1 02     4.00
    16     1 02     5.00
    

    In addition, if one whichs to subset on an exact values, say the first NUM is -3 and last NUM is 5, i.e. row 9-16 in the original data, tThis can be done like this,

    df %>%
      group_by(id, grp) %>%
      mutate(first = first(NUM)
            ,last = last(NUM)) %>%
      filter(-3 == first & 5 == last) %>%
      select(-first, -last)  
    #> # A tibble: 8 x 3
    #> # Groups:   id, grp [1]
    #>      id   grp   NUM
    #>   <int> <chr> <dbl>
    #> 1     1    02    -3
    #> 2     1    02    -2
    #> 3     1    02    -1
    #> 4     1    02     1
    #> 5     1    02     2
    #> 6     1    02     3
    #> 7     1    02     4
    #> 8     1    02     5
    

    The above is inspired by this SO answer.