rdplyrtidyr

Selecting columns to pivot based on column data type


I'm looking to pivot a data frame using pivot_longer. Generally, I select a column using some pattern match (e.g. pivot_longer(cols = contains(...)) ). Now I wanted to select columns based on the data types they contain which did not work as expected. For example, to pivot columns containing logical data I tried:

df <- data.frame(nums = seq(1,6,1),
                 chrs = letters[1:6],
                 logis = rep(c(TRUE, FALSE), 3),
                 logis2 = rep(c(TRUE, FALSE), 3))

pivot_longer(df,
             cols = sapply(df, is.logical))

This results in an error telling me that the cols argument only accepts numeric or character data types.

A simple solution would be to identify the columns containing a certain data type, get the names of these columns and then pivot the data using the vector of characters:

vars_to_pivot <- names(df)[sapply(df, is.logical)]

pivot_longer(df,
             cols = all_of(vars_to_pivot)) %>%
   head

# A tibble: 6 × 4
   nums chrs  name   value
  <dbl> <chr> <chr>  <lgl>
1     1 a     logis  TRUE 
2     1 a     logis2 TRUE 
3     2 b     logis  FALSE
4     2 b     logis2 FALSE
5     3 c     logis  TRUE 
6     3 c     logis2 TRUE 

I'm however curious whether there is a dplyr/tidyverse verb for column selection within pivot functions based on logical vectors?


Solution

  • You can tidyselect::where() here, which selects the variables for which a function returns TRUE:

    pivot_longer(df,
        cols = where(is.logical)
    )
    # A tibble: 12 × 4
        nums chrs  name   value
       <dbl> <chr> <chr>  <lgl>
     1     1 a     logis  TRUE 
     2     1 a     logis2 TRUE 
     3     2 b     logis  FALSE
     4     2 b     logis2 FALSE
     5     3 c     logis  TRUE 
     6     3 c     logis2 TRUE 
     7     4 d     logis  FALSE
     8     4 d     logis2 FALSE
     9     5 e     logis  TRUE 
    10     5 e     logis2 TRUE 
    11     6 f     logis  FALSE
    12     6 f     logis2 FALSE
    

    This is a helper function that is part tidyselect, a domain-specific language for selecting variables. The full list (which includes all_of() that you use in your question) is here.