rperformancefor-loopmodel.matrix

Faster Alternatives for for-loop


I have the following problem:

My Data Frame looks like the following, even though its a lot bigger (20GB):

Letters <- c("A","B","C")
Numbers <- c(1,0,1)
Numbers <- as.integer(Numbers)

Data.Frame <- data.frame(Letters,Numbers)

Now I want to create a Dummy Variable for the Letters and wrote the following for-loop:

for(level in unique(Data.Frame$Letters)){Data.Frame[paste("", level, sep = "")]
<- ifelse(Data.Frame$Letters == level, 1, 0)}

Because my Data-frame is so large though it takes a very long time to execute. Another possible solution I tried was:

factors <- model.matrix(~Letters-1, data=Data.Frame)
cbind(Data.Frame, factors)

The result is the same, but when I use this on a larger Data-frame it takes even longer.

Are there any possible alternatives, which would result in the same solution and are computationally faster?

Thank you very much in advance!


Solution

  • You could use dcast.data.table from package data.table like this

    dt <- data.table(Letters,Numbers)
    dcast.data.table(dt, Letters+Numbers~Letters,fun.aggregate=length)
    
       Letters Numbers A B C
    1:       A       1 1 0 0
    2:       B       0 0 1 0
    3:       C       1 0 0 1