rdata.table

Cartesian Product using data.table package


Using the data.table package in R, I am trying to create a cartesian product of two data.tables using the merge method as one would do in base R.

In base the following works:

#assume this order data
orders <- data.frame(date = as.POSIXct(c('2012-08-28','2012-08-29','2012-09-01')),
                     first.name = as.character(c('John','George','Henry')),
                     last.name = as.character(c('Doe','Smith','Smith')),
                     qty = c(10,50,6))

#and these dates
dates <- data.frame(date = seq(from = as.POSIXct('2012-08-28'),
                               to = as.POSIXct('2012-09-07'), by = 'day'))

#get the unique customers
cust<-unique(orders[,c('first.name','last.name')])

#using merge from base R, get the cartesian product
merge(dates, cust, by = integer(0))

However, the same technique does not work using data.table and this error is thrown:

"Error in merge.data.table(dates.dt, cust.dt, by = integer(0)) : 
  A non-empty vector of column names for `by` is required."
#data.table approach
library(data.table)

orders.dt <- data.table(orders)

dates.dt <- data.table(dates)

cust.dt <- unique(orders.dt[, list(first.name, last.name)])

#try to use merge (data.table) in the same manner as base
merge(dates.dt, cust.dt, by = integer(0))
Error in merge.data.table(dates.dt, cust.dt, by = integer(0)) : 
  A non-empty vector of column names for `by` is required.

I want the result to reflect all customer names for all dates, just like in base, but do it in a data.table-centric way. Is this possible?


Solution

  • If you first construct full names from the first and last in the cust-dataframe, you can then use CJ (cross-join). You cannot use all three vectors since there would be 99 items and teh first names would get inappropriately mixed with last names.

    > nrow(CJ(dates$date, cust$first.name, cust$last.name ) )
    [1] 99
    

    This returns the desired data.table object:

    > CJ(dates$date,paste(cust$first.name, cust$last.name) )
                V1           V2
     1: 2012-08-28 George Smith
     2: 2012-08-28  Henry Smith
     3: 2012-08-28     John Doe
     4: 2012-08-29 George Smith
     5: 2012-08-29  Henry Smith
     6: 2012-08-29     John Doe
     7: 2012-08-30 George Smith
     8: 2012-08-30  Henry Smith
     9: 2012-08-30     John Doe
    10: 2012-08-31     John Doe
    11: 2012-08-31 George Smith
    12: 2012-08-31  Henry Smith
    13: 2012-09-01     John Doe
    14: 2012-09-01 George Smith
    15: 2012-09-01  Henry Smith
    16: 2012-09-02 George Smith
    17: 2012-09-02  Henry Smith
    18: 2012-09-02     John Doe
    19: 2012-09-03  Henry Smith
    20: 2012-09-03     John Doe
    21: 2012-09-03 George Smith
    22: 2012-09-04  Henry Smith
    23: 2012-09-04     John Doe
    24: 2012-09-04 George Smith
    25: 2012-09-05 George Smith
    26: 2012-09-05  Henry Smith
    27: 2012-09-05     John Doe
    28: 2012-09-06 George Smith
    29: 2012-09-06  Henry Smith
    30: 2012-09-06     John Doe
    31: 2012-09-07 George Smith
    32: 2012-09-07  Henry Smith
    33: 2012-09-07     John Doe
                V1           V2