rdplyr

How to use if statements inside of groups in a dataframe in R?


I would like to perform rowwise if statements with a condition that is colwise. Preferably using only base R and dplyr.

library(dplyr)

id <- c(1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 1, 2, 3)
year <- c(2015, 2015, 2016, 2016, 2017, 2015, 2016, 2016, 2017, 2017, 2018, 2015, 2016, 2018, 2017, 2018, 2018, 2015, 2016, 2018)
value <- c(100, NA, 150, 0, 200, 120, 0, NA, 130, 140, 0, 160, 170, NA, 180, 190, 200, 0, 150, 160)

df <- data.frame(id, year, value)

df <- df %>%
  group_by(id, year) %>%
  mutate(value = if_else(0 && any(value > 0), NA, value))

Specifically, I want to replace 0 with an NA, if there are other non-zero values in the same column, in the same group.

Later on, I would choose the row with least NAs for all the duplicate rows.

The above code throws an error, because the value in the false of if_else() returns the whole column (inside of the group) as a vector.

I would imagine something like either a vectorised if() (without else), or a placeholder for the if_else() that basically says "do nothing". A dplyr placeholder for rows or the specific value could also help, but I couldn't find anything appropriate.

If someone has a general approach on how to do these kind "row-by-row with group context operations", I would appreciate any insights.


Solution

  • I don't think the example data includes any examples of a group with a zero but no non-NA non-zeroes, so I add one here at the bottom.

    df |>
      bind_rows(data.frame(id = 10, year = 2015, value = c(0,0,NA))) |>
      mutate(value2 = if_else(any(value[!is.na(value)] != 0) & value == 0, NA, value), 
             .by = c(id, year))
    
       
    

    The if_else here checks if the group defined with .by = c(id, year) has any non-NA, non-zero values, AND the current value is zero, in which case we'd switch to NA.

    Here, rows 4, 7, 11, and 18 are changed 0 to NA. But rows 21-22 are left as is, since that group has no valid values.

       id year value value2
    1   1 2015   100    100
    2   1 2015    NA     NA
    3   1 2016   150    150
    4   1 2016     0     NA
    5   1 2017   200    200
    6   2 2015   120    120
    7   2 2016     0     NA
    8   2 2016    NA     NA
    9   3 2017   130    130
    10  3 2017   140    140
    11  3 2018     0     NA
    12  4 2015   160    160
    13  4 2016   170    170
    14  4 2018    NA     NA
    15  5 2017   180    180
    16  5 2018   190    190
    17  5 2018   200    200
    18  1 2015     0     NA
    19  2 2016   150    150
    20  3 2018   160    160
    21 10 2015     0      0
    22 10 2015     0      0
    23 10 2015    NA     NA