I have the following problem:
My Data Frame looks like the following, even though its a lot bigger (20GB):
Letters <- c("A","B","C")
Numbers <- c(1,0,1)
Numbers <- as.integer(Numbers)
Data.Frame <- data.frame(Letters,Numbers)
Now I want to create a Dummy Variable for the Letters and wrote the following for-loop:
for(level in unique(Data.Frame$Letters)){Data.Frame[paste("", level, sep = "")]
<- ifelse(Data.Frame$Letters == level, 1, 0)}
Because my Data-frame is so large though it takes a very long time to execute. Another possible solution I tried was:
factors <- model.matrix(~Letters-1, data=Data.Frame)
cbind(Data.Frame, factors)
The result is the same, but when I use this on a larger Data-frame it takes even longer.
Are there any possible alternatives, which would result in the same solution and are computationally faster?
Thank you very much in advance!
You could use dcast.data.table
from package data.table
like this
dt <- data.table(Letters,Numbers)
dcast.data.table(dt, Letters+Numbers~Letters,fun.aggregate=length)
Letters Numbers A B C
1: A 1 1 0 0
2: B 0 0 1 0
3: C 1 0 0 1