rif-statementdplyrbrackets

Differences between two columns having missing values in one


This is my database example in longitudinal way containing:

  1. id = Individuals
  2. grup_int = categorical variable of group
  3. gen = variables to be measured
  4. time = point in time for id and gene
  5. value

        id grup_int gen   time   value
     <dbl>    <dbl> <chr> <chr>  <dbl>
1 60801001        1 adrb2 2     2.11  
2 60801001        1 ccl2  1     0.941 
3 60801001        1 ccl2  2     0.248 
4 60801001        1 ccl3  1     5.65  
5 60801001        1 ccl3  2     NA

What I want to do in my whole database is substracting variables from time = 2 minus time = 1 grouped by id and grup_int --> something like this:


df %>% dplyr::group_by(id, gen) %>%  dplyr::mutate(d_post_pre =  value [time == 2] - value [time == 1])

`d_post_pre` must be size 1, not 0.
ℹ The error occurred in group 1: id = 60801001, gen = "adrb2".

As you can see for the adrb2 gen, there is no even entry for time == 1, and because of that is throwing this error. The possibilities to be found in value are:

Can you suggest any option to perform the line to be shielded from missing values or just paste some text or string to be filtered out?

Thanks in advance!


df <- structure(list(id = structure(c(60801001, 60801001, 60801001, 
60801001, 60801001), label = "Identificador", format.spss = "F9.0", display_width = 14L), 
    grup_int = structure(c(1, 1, 1, 1, 1), format.spss = "F2.0"), 
    gen = c("adrb2", "ccl2", "ccl2", "ccl3", "ccl3"), time = c("2", 
    "1", "2", "1", "2"), value = c(2.1098254, 0.94088, 0.24778089, 
    5.6529145, 0.06939283)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -5L))


Solution

  • What about something like this? Use pivot_wider to define value columns for time == 1 and time == 2, and then you can simply subtract the columns. Then you can convert to long format again.

    df |> 
      tidyr::pivot_wider(
        id_cols = c("id", "grup_int", "gen"),
        names_from = "time",
        names_prefix = "time_",
        values_from = "value"
      ) |> 
      dplyr::mutate(d_post_pre = time_2 - time_1) |> 
      tidyr::pivot_longer(
        cols = c("time_1", "time_2"),
        names_to = "time",
        names_prefix = "time_"
      ) |>
      dplyr::filter(!is.na(value)) |> 
      dplyr::select(id, grup_int, gen, time, value, d_post_pre, everything())
    
    # # A tibble: 6 × 6
    #         id grup_int gen   time    value d_post_pre
    #      <dbl>    <dbl> <chr> <chr>   <dbl>      <dbl>
    # 1 60801001        1 adrb2 2      2.11       NA    
    # 2 60801001        1 ccl2  1      0.941      -0.693
    # 3 60801001        1 ccl2  2      0.248      -0.693
    # 4 60801001        1 ccl3  1      5.65       -5.58 
    # 5 60801001        1 ccl3  2      0.0694     -5.58