I need to convert a huge dataset into a matrix. The data structure likes the below data "x". When I use the function tapply to do it (see below), it cannot work due to memory limit for the huge dataset. I am wondering if there is another way to do it. I tried for the R package dplyr, but didn't figure out how to do it. Any suggestion for that? Thanks so much.
x <- data.frame(c1=c("A","B","C","A","B","C"), c2=1:6, c3=c("sp1","sp2","sp2","sp2","sp3","sp4"))
y <- tapply(x$c2, list(x$c1, x$c3), sum)
You can use pivot_wider
:
tidyr::pivot_wider(x, names_from = c3, values_from = c2,
values_fn = sum, values_fill = 0)
# c1 sp1 sp2 sp3 sp4
# <chr> <int> <int> <int> <int>
#1 A 1 4 0 0
#2 B 0 2 5 0
#3 C 0 3 0 6
data.table
is usually faster in which case you can use -
library(data.table)
dcast(setDT(x), c1~c3, value.var = 'c2', fun.aggregate = sum)
Output can also be represented in long format.
library(dplyr)
x %>%
group_by(c1, c3) %>%
summarise(c2 = sum(c2)) %>%
ungroup
and with data.table
-
setDT(x)[, sum(c2), .(c1, c3)]