rmergedplyrdomc

Merge two different dataframes when column with similar value has similar row value


The simple question We have two different dataframes with a similar setup as the output of DF_1 & DF_2

DF_1 = as.data.frame(matrix(c(1,2,3,4,20,55,12,44),nrow=4,ncol=2))
DF_2 = as.data.frame(matrix(c(1,2,3,4,33,32,21,44,66,71),nrow=2,ncol=5))

How does one merge DF_1 and DF_2 to create a combined dataframe call it combined_DF which looks like the output of this

combined_DF = as.data.frame(matrix(c(1,2,1,2,20,55,12,44,3,4,3,4,33,32,33,32,21,44,21,44,66,71,66,71),nrow=4,ncol=6))

What would be a fast way, which creates the same output as combined_DF?

The harder question In reality the two dataframes which needs to be merged is much bigger, in fact

DF1 is 156798 obs. of 9 variables

DF2 is 6441 obs. of 92 variables

Now how does one merge DF_1 and DF_2 to create a combined dataframe call it combined_DF which has 156798 obs. of 99 variables? (same as principle as the simple question, but this time much larger dataframes) Can one use doMC::registerDoMC to speed up this process.

Answers is much appreciated :)


Solution

  • via base R,

     cbind(V6 = DF_1$V2, DF_2[rep(rownames(DF_2), nrow(DF_1) - nrow(DF_2)),])
    
    #    V6 V1 V2 V3 V4 V5
    #1   20  1  3 33 21 66
    #2   55  2  4 32 44 71
    #1.1 12  1  3 33 21 66
    #2.1 44  2  4 32 44 71