I have a dataset (dummy data set below). I need to pivot it wider but also create a list with each level being by 'Block'
Block<-c("a1","a1","a1","a1","a1","b1","b1","b1","b1") #2 block
Species<-c("c","c","c","d","d","c","e","c","e")
date<-c("2021-03-29","2021-03-30","2021-03-28","2021-03-30","2021-03-29",
"2021-03-29","2021-03-29","2021-03-28","2021-03-28")
df<-data.frame(Block, Species, date)
Below is the output I am trying to obtain. I know how to do this if I pull each block out individually and then pivot wider and then combine each into a list. However, in reality I have a much larger dataset with more dates, species, and blocks so I don't really want to do this by hand. Is there a more efficient way to accomplish this? Thank you very much.
$a1
2021-03-28 2021-03-29 2021-03-30
c 1 1 1
d 0 1 1
$b1
2021-03-28 2021-03-29
c 1 1
e 1 1
Assuming the numbers are counts of the Species per date,
using lapply
and data.frame
to get the desired format with row names from Species
library(dplyr)
library(tidyr)
summarize(df, uID = length(Species), .by = c(Species, date, Block)) %>%
arrange(date) %>%
pivot_wider(names_from = date, values_from = uID, values_fill = 0) %>%
split(., ~ Block) %>%
lapply(., \(x)
data.frame(x[!grepl("Block|Species", colnames(x))],
row.names = x$Species, check.names = F))
output
$a1
2021-03-28 2021-03-29 2021-03-30
c 1 1 2
d 0 1 1
$b1
2021-03-28 2021-03-29 2021-03-30
c 1 1 0
e 1 1 0
or (depending on how many Blocks are created maybe less performant because pivot_wider
has to be applied Block times)
library(dplyr)
library(tidyr)
split(df, ~ Block) %>%
lapply(., \(x)
arrange(x, date) %>%
pivot_wider(names_from = date, values_from = Block, values_fn = ~ length(.x), values_fill = 0) %>%
data.frame(., row.names = .$Species, check.names = F) %>%
select(-Species))
output
$a1
2021-03-28 2021-03-29 2021-03-30
c 1 1 2
d 0 1 1
$b1
2021-03-28 2021-03-29
c 1 1
e 1 1
(slightly modified to get counts > 1)
df <- structure(list(Block = c("a1", "a1", "a1", "a1", "a1", "a1",
"b1", "b1", "b1", "b1"), Species = c("c", "c", "c", "c", "d",
"d", "c", "e", "c", "e"), date = c("2021-03-29", "2021-03-30",
"2021-03-30", "2021-03-28", "2021-03-30", "2021-03-29", "2021-03-29",
"2021-03-29", "2021-03-28", "2021-03-28")), row.names = c("1",
"2", "21", "3", "4", "5", "6", "7", "8", "9"), class = "data.frame")