rlisttidyverse

dynamically extract elements from list column


I have the following data:

df <- structure(list(id = c("1358792", "1358792", "333482", "333482", "747475", "747475"),
                     x = c("123", "123", "456", "456", NA, NA),
                     all_x = list("123", "123",
                                  c("456", "789"),
                                  c("456", "789"),
                                  list(),
                                  list())),
                row.names = c(NA, -6L),
                class = "data.frame")
    
       id    x    all_x
1 1358792  123      123
2 1358792  123      123
3  333482  456 456, 789
4  333482  456 456, 789
5  747475 <NA>     NULL
6  747475 <NA>     NULL

The all_x column is a list with either an EMPTY/NULL value, a single character or a character vector.

I want to create a new column (tidyverse style) with the following logic: when the all_x column has one or no value, just take the value from x. If it has two values (i.e. is a character vector), we want to group by id and take the element that corresponds to the row number, i.e. for the first id value, take the first element of the character vector, for the second id element, take the second character value and so on.

Desired output would be an additional character column with the respective values, i.e.

       id    x    all_x   x2
1 1358792  123      123  123
2 1358792  123      123  123
3  333482  456 456, 789  456
4  333482  456 456, 789  789
5  747475 <NA>     NULL <NA>
6  747475 <NA>     NULL <NA>

I have tried tons of variants with if_else, ifelse and unlisting and indexing, but still always get errors due to the mixed structure of the all_x column.

Here's the closest I got:

library(tidyverse)
df |>
  mutate(x2 = if_else(lengths(all_x) > 1, all_x[[1]][row_number()], x), .by = id)

However, obviously, I'm not successful.


Solution

  • I think you should use ifelse (rather than if_else), with the minimal effort to make it fly

    > df |>
    +     mutate(x2 = ifelse(lengths(all_x) > 1, all_x[[1]][row_number()], x), .by = id)
           id    x    all_x   x2
    1 1358792  123      123  123
    2 1358792  123      123  123
    3  333482  456 456, 789  456
    4  333482  456 456, 789  789
    5  747475 <NA>     NULL <NA>
    6  747475 <NA>     NULL <NA>
    

    Note: the difference between if_else and ifelse