rloopsspread

How to loop over the columns in a dataframe, apply spread, and create a new dataframe in R?


I have a dataframe which looks like this example, just much larger:

Name  date         var1  var2  var3 
Peter 2020-03-30   0.4   0.5   0.2
Ben   2020-10-14   0.6   0.4   0.1
Mary  2020-12-06   0.7   0.2   0.9

I want to create a new dataframe for each variable (i.e., var1, var2, var3), which should look like this, e.g., for var1:

date         Peter    Ben    Mary
2020-03-30   0.4      NA     NA
2020-10-14   NA       0.6    NA
2020-12-06   NA       NA     0.7

I can do it with spread for one variable at a time:

df_new <-tidyr::spread(df[,-c(2:3)], name, var1)

But I could not figure out how to loop it over all columns as I am new to R.

Thank you!


Solution

  • First we want to create a list of data frames and then pivot each one:

    library(tidyverse)
    res_list = dat %>% 
       pivot_longer(cols = contains("var")) %>% 
       split(., .$name) %>% 
       map(. %>% pivot_wider(names_from="Name"))
    
    $var1
    # A tibble: 3 × 5
      date       name  Peter   Ben  Mary
      <date>     <chr> <dbl> <dbl> <dbl>
    1 2020-03-30 var1    0.4  NA    NA  
    2 2020-10-14 var1   NA     0.6  NA  
    3 2020-12-06 var1   NA    NA     0.7
    
    $var2
    # A tibble: 3 × 5
      date       name  Peter   Ben  Mary
      <date>     <chr> <dbl> <dbl> <dbl>
    1 2020-03-30 var2    0.5  NA    NA  
    2 2020-10-14 var2   NA     0.4  NA  
    3 2020-12-06 var2   NA    NA     0.2
    
    $var3
    # A tibble: 3 × 5
      date       name  Peter   Ben  Mary
      <date>     <chr> <dbl> <dbl> <dbl>
    1 2020-03-30 var3    0.2  NA    NA  
    2 2020-10-14 var3   NA     0.1  NA  
    3 2020-12-06 var3   NA    NA     0.9
    

    Then you can access them like

    res_list["var1"]
    
    # A tibble: 3 × 5
      date       name  Peter   Ben  Mary
      <date>     <chr> <dbl> <dbl> <dbl>
    1 2020-03-30 var1    0.4  NA    NA  
    2 2020-10-14 var1   NA     0.6  NA  
    3 2020-12-06 var1   NA    NA     0.7