rtextgroupingdifferencecounting

Count then calculate differences from first and previous rows after grouping, but ignoring text/character values in r


From the following data grouped by id and visit, which include a mix of numeric and text/character values, how to create these 3 new columns:

Data:

dat <-  
structure(list(id = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("1", "2"), class = "factor"), 
    visit = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 
    1L, 1L, 2L, 2L, 2L), .Label = c("1", "2"), class = "factor"), 
    value = c("5", "7", "10", "20", "15", "text0", "25", "text1", 
    "100", "text2", "text3", "120", "text4", "50", "45"), count = c(1L, 
    2L, 3L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 1L, 2L, 3L)), class = "data.frame", row.names = c(NA, 
-15L)) 

Desired output:

> dat2
   id visit value count count_wotxt diff_value_first diff_value_previous
1   1     1     5     1           1                0                   0
2   1     1     7     2           2                2                   2
3   1     1    10     3           3                5                   3
4   1     2    20     1           1                0                   0
5   1     2    15     2           2               -5                  -5
6   1     2 text0     3          NA               NA                  NA
7   1     2    25     4           3                5                  10
8   1     2 text1     5          NA               NA                  NA
9   2     1   100     1           1                0                   0
10  2     1 text2     2          NA               NA                  NA
11  2     1 text3     3          NA               NA                  NA
12  2     1   120     4           2               20                  20
13  2     2 text4     1          NA               NA                  NA
14  2     2    50     2           1               NA                   0
15  2     2    45     3           2               NA                  -5 

Thanks for help


Solution

  • Here is one logic we could apply to get the desired output: I think the most challenging part is the difference of the previous row in the presence of NAs. I have solved this by using fill:

    library(dplyr)
    library(tidyr)
    
    dat %>% 
      group_by(id, visit) %>% 
      mutate(count_wotxt = ifelse(grepl("^[0-9]+$", value), cumsum(grepl("^[0-9]+$", value)), NA),
             value_numeric = as.numeric(as.character(value)),
             diff_value_first = value_numeric - value_numeric[1]) %>% 
      fill(value_numeric, .direction = "down") %>% 
      mutate(diff_value_previous = value_numeric - lag(value_numeric, default = first(value_numeric)), .keep="unused") %>% 
      ungroup()
    
    # A tibble: 15 × 7
       id    visit value count count_wotxt diff_value_first diff_value_previous
       <fct> <fct> <chr> <int>       <int>            <dbl>               <dbl>
     1 1     1     5         1           1                0                   0
     2 1     1     7         2           2                2                   2
     3 1     1     10        3           3                5                   3
     4 1     2     20        1           1                0                   0
     5 1     2     15        2           2               -5                  -5
     6 1     2     text0     3          NA               NA                   0
     7 1     2     25        4           3                5                  10
     8 1     2     text1     5          NA               NA                   0
     9 2     1     100       1           1                0                   0
    10 2     1     text2     2          NA               NA                   0
    11 2     1     text3     3          NA               NA                   0
    12 2     1     120       4           2               20                  20
    13 2     2     text4     1          NA               NA                  NA
    14 2     2     50        2           1               NA                  NA
    15 2     2     45        3           2               NA                  -5
    Warning message:
    There were 3 warnings in `mutate()`.
    The first warning was:
    ℹ In argument: `value_numeric = as.numeric(as.character(value))`.
    ℹ In group 2: `id = 1`, `visit = 2`.
    Caused by warning:
    ! NAs introduced by coercion
    ℹ Run dplyr::last_dplyr_warnings() to see the 2 remaining warnings.