rdplyrtidyr

How to pivot data and create a list


I have a dataset (dummy data set below). I need to pivot it wider but also create a list with each level being by 'Block'

Block<-c("a1","a1","a1","a1","a1","b1","b1","b1","b1") #2 block
Species<-c("c","c","c","d","d","c","e","c","e")
date<-c("2021-03-29","2021-03-30","2021-03-28","2021-03-30","2021-03-29",
"2021-03-29","2021-03-29","2021-03-28","2021-03-28")
df<-data.frame(Block, Species, date)

Below is the output I am trying to obtain. I know how to do this if I pull each block out individually and then pivot wider and then combine each into a list. However, in reality I have a much larger dataset with more dates, species, and blocks so I don't really want to do this by hand. Is there a more efficient way to accomplish this? Thank you very much.

$a1
     2021-03-28 2021-03-29 2021-03-30 
c    1              1          1   
d    0              1          1   

$b1
     2021-03-28 2021-03-29 
c    1              1             
e    1              1            

Solution

  • Assuming the numbers are counts of the Species per date, using lapply and data.frame to get the desired format with row names from Species

    library(dplyr)
    library(tidyr)
    
    summarize(df, uID = length(Species), .by = c(Species, date, Block)) %>% 
      arrange(date) %>% 
      pivot_wider(names_from = date, values_from = uID, values_fill = 0) %>% 
      split(., ~ Block) %>% 
      lapply(., \(x) 
        data.frame(x[!grepl("Block|Species", colnames(x))], 
          row.names = x$Species, check.names = F))
    

    output

    $a1
      2021-03-28 2021-03-29 2021-03-30
    c          1          1          2
    d          0          1          1
    
    $b1
      2021-03-28 2021-03-29 2021-03-30
    c          1          1          0
    e          1          1          0
    

    or (depending on how many Blocks are created maybe less performant because pivot_wider has to be applied Block times)

    library(dplyr)
    library(tidyr)
    
    split(df, ~ Block) %>% 
      lapply(., \(x) 
        arrange(x, date) %>% 
        pivot_wider(names_from = date, values_from = Block, values_fn = ~ length(.x), values_fill = 0) %>% 
        data.frame(., row.names = .$Species, check.names = F) %>%
        select(-Species))
    

    output

    $a1
      2021-03-28 2021-03-29 2021-03-30
    c          1          1          2
    d          0          1          1
    
    $b1
      2021-03-28 2021-03-29
    c          1          1
    e          1          1
    
    Data

    (slightly modified to get counts > 1)

    df <- structure(list(Block = c("a1", "a1", "a1", "a1", "a1", "a1", 
    "b1", "b1", "b1", "b1"), Species = c("c", "c", "c", "c", "d", 
    "d", "c", "e", "c", "e"), date = c("2021-03-29", "2021-03-30", 
    "2021-03-30", "2021-03-28", "2021-03-30", "2021-03-29", "2021-03-29", 
    "2021-03-29", "2021-03-28", "2021-03-28")), row.names = c("1", 
    "2", "21", "3", "4", "5", "6", "7", "8", "9"), class = "data.frame")