rdata.tablestrsplitsplitstackshape

R create data table columns dynamically


I have this data table called tmp.df.lhs.denorm which I provided the first 2 rows ahead:

    > dput(tmp.df.lhs.denorm[1:2])
structure(list(rules = c("{} => {Dental anesthetic products-Injectables cartridges|2288210-Septocaine Cart 4% w/EPI}", 
"{Dental small equipment-Water distiller parts & acc|5528005-EzeeKleen 2.5HD UV Lamp1,Dental small equipment-Water distiller parts & acc|5528005-EzeeKleen 2.5HD UV Lamp2} => {Dental small equipment-Water distiller parts & acc|5528004-EzeeKleen 2.5HD RO Membra}"
), support = c(0.501710236989983, 0.000610798924993892), confidence = c(0.501710236989983, 
1), lift = c(1, 1637.2), rule.id = 1:2, lhs_1 = c(NA, "Dental small equipment-Water distiller parts & acc|5528005-EzeeKleen 2.5HD UV Lamp1"
), lhs_2 = c(NA, "Dental small equipment-Water distiller parts & acc|5528005-EzeeKleen 2.5HD UV Lamp2"
)), .Names = c("rules", "support", "confidence", "lift", "rule.id", 
"lhs_1", "lhs_2"), class = c("data.table", "data.frame"), row.names = c(NA, 
-2L), .internal.selfref = <pointer: 0x0000000007120788>)

Note columns lhs_1 and lhs_2 which are the product of str split on column rules.

My problem is that for different data, the column rules might contain varying number of rules seperated by a comma, e.g. I could have gotten 3 columns lhs_1 , lhs_2 and lhs_3 and so on, depending how many commas I have in the column rules. The solution is to determine a fixed number of lhs_* columns (parameter in my code, let's say 6), wherein this specific example the dt tmp.df.lhs.denorm will be merged with additional 4 empty columns by the name lhs_3, lhs_4, lhs_5 and lhs_6. Any assistance appreciated


Solution

  • I found a workaround that does the job:

    tmp.df.lhs.denorm.art <- data.table(rules = character(),
                                             support = numeric(),
                                             confidence = numeric(),
                                             lift = numeric(),
                                             rule.id = integer(),
                                            lhs_1 = character(),
                                            lhs_2 = character(),
                                            lhs_3 = character(),
                                            lhs_4 = character(),
                                            lhs_5 = character(),
                                            lhs_6 = character()
                                          )
      tmp.df.lhs.denorm.complete <- rbindlist(list(tmp.df.lhs.denorm, tmp.df.lhs.denorm.art), fill=TRUE)