rpurrrr-labelled

using purrr::pmap() to assign data frame column labels


I was trying to assign labels to columns of multiple data frames. I have more than 10 data frames I wanted to manipulate but here are some examples:

df1 = tribble(
  ~a_age, ~a01edu, ~other_vars,
  35, 17, 1,
  41, 14, 2,
  28, 12, 3,
  68, 99, 4
)

df2 = tribble(
  ~b_age, ~b01edu, ~some_vars,
  25, 10, 2,
  52, 8, 1,
  31, 20, 5
)

df3 = tribble(
  ~c_age, ~c01edu,
  55, 16,
  47, 11,
  68, 16,
  36, 6, 
  29, 16
)

Each data frame has certain columns that have simliar names such as a...some_name, b...some_name and so on. I tried using labelled::set_variable_labels() to create column labels for one data frame, and it worked fine.

df1 = df1 |> labelled::set_variable_labels(
  .labels = list("a_age" = "Age",
                 "a01edu" = "Highest education completed")
)

Output:

output1

Then I tried using purrr::pmap() to assign column labels to all data frames at once but it gave me an error.

df_list = list(df1, df2, df3) |> setNames(c("a", "b", "c"))

params = tribble(
  ~x, ~y, ~z,
  "a", "a_age", "a01edu",
  "b", "b_age", "b01edu",
  "c", "c_age", "c01edu"
)

pmap(params,
     function(x, y, z) {
       df_list[[x]] |> labelled::set_variable_labels(
         .labels = list(y = "Age",
                        z = "Highest education completed")
         )
       }
     )

The error message

<error/rlang_error>
Error in `pmap()`:
ℹ In index: 1.
Caused by error in `var_label<-.data.frame`:
! some variables not found in x:y, z
---
Backtrace:
 1. purrr::pmap(...)
 2. purrr:::pmap_("list", .l, .f, ..., .progress = .progress)
 5. global .f(x = .l[[1L]][[i]], y = .l[[2L]][[i]], z = .l[[3L]][[i]], ...)
 6. labelled::set_variable_labels(...)
 8. labelled:::`var_label<-.data.frame`(`*tmp*`, value = .labels)
 9. base::stop("some variables not found in x:", missing_names)

Why am I getting this error? I thought I set up the params object correctly so that the column names in df_list match the ones I'm feeding into the function function(x, y, z). I'm pretty sure there are better ways to achieve what I'm trying to do. Any help would be very much appreciated. Thank you!


Solution

  • It is just that the = wouldn't allow lhs to be evaluated. We may use := with dplyr::lst

    library(dplyr)
    library(purrr)
    df_list2 <- pmap(params, ~ df_list[[..1]] |> 
        labelled::set_variable_labels(
             .labels = lst(!!..2 := "Age",
                            !! ..3 := "Highest education completed")
             )
     )
    

    -output

    [[1]]
    # A tibble: 4 × 3
      a_age a01edu other_vars
      <dbl>  <dbl>      <dbl>
    1    35     17          1
    2    41     14          2
    3    28     12          3
    4    68     99          4
    
    [[2]]
    # A tibble: 3 × 3
      b_age b01edu some_vars
      <dbl>  <dbl>     <dbl>
    1    25     10         2
    2    52      8         1
    3    31     20         5
    
    [[3]]
    # A tibble: 5 × 2
      c_age c01edu
      <dbl>  <dbl>
    1    55     16
    2    47     11
    3    68     16
    4    36      6
    5    29     16
    
    > str(df_list2)
    List of 3
     $ : tibble [4 × 3] (S3: tbl_df/tbl/data.frame)
      ..$ a_age     : num [1:4] 35 41 28 68
      .. ..- attr(*, "label")= chr "Age"
      ..$ a01edu    : num [1:4] 17 14 12 99
      .. ..- attr(*, "label")= chr "Highest education completed"
      ..$ other_vars: num [1:4] 1 2 3 4
     $ : tibble [3 × 3] (S3: tbl_df/tbl/data.frame)
      ..$ b_age    : num [1:3] 25 52 31
      .. ..- attr(*, "label")= chr "Age"
      ..$ b01edu   : num [1:3] 10 8 20
      .. ..- attr(*, "label")= chr "Highest education completed"
      ..$ some_vars: num [1:3] 2 1 5
     $ : tibble [5 × 2] (S3: tbl_df/tbl/data.frame)
      ..$ c_age : num [1:5] 55 47 68 36 29
      .. ..- attr(*, "label")= chr "Age"
      ..$ c01edu: num [1:5] 16 11 16 6 16
      .. ..- attr(*, "label")= chr "Highest education completed"