I have this data table called tmp.df.lhs.denorm which I provided the first 2 rows ahead:
> dput(tmp.df.lhs.denorm[1:2])
structure(list(rules = c("{} => {Dental anesthetic products-Injectables cartridges|2288210-Septocaine Cart 4% w/EPI}",
"{Dental small equipment-Water distiller parts & acc|5528005-EzeeKleen 2.5HD UV Lamp1,Dental small equipment-Water distiller parts & acc|5528005-EzeeKleen 2.5HD UV Lamp2} => {Dental small equipment-Water distiller parts & acc|5528004-EzeeKleen 2.5HD RO Membra}"
), support = c(0.501710236989983, 0.000610798924993892), confidence = c(0.501710236989983,
1), lift = c(1, 1637.2), rule.id = 1:2, lhs_1 = c(NA, "Dental small equipment-Water distiller parts & acc|5528005-EzeeKleen 2.5HD UV Lamp1"
), lhs_2 = c(NA, "Dental small equipment-Water distiller parts & acc|5528005-EzeeKleen 2.5HD UV Lamp2"
)), .Names = c("rules", "support", "confidence", "lift", "rule.id",
"lhs_1", "lhs_2"), class = c("data.table", "data.frame"), row.names = c(NA,
-2L), .internal.selfref = <pointer: 0x0000000007120788>)
Note columns lhs_1 and lhs_2 which are the product of str split on column rules.
My problem is that for different data, the column rules might contain varying number of rules seperated by a comma, e.g. I could have gotten 3 columns lhs_1 , lhs_2 and lhs_3 and so on, depending how many commas I have in the column rules. The solution is to determine a fixed number of lhs_* columns (parameter in my code, let's say 6), wherein this specific example the dt tmp.df.lhs.denorm will be merged with additional 4 empty columns by the name lhs_3, lhs_4, lhs_5 and lhs_6. Any assistance appreciated
I found a workaround that does the job:
tmp.df.lhs.denorm.art <- data.table(rules = character(),
support = numeric(),
confidence = numeric(),
lift = numeric(),
rule.id = integer(),
lhs_1 = character(),
lhs_2 = character(),
lhs_3 = character(),
lhs_4 = character(),
lhs_5 = character(),
lhs_6 = character()
)
tmp.df.lhs.denorm.complete <- rbindlist(list(tmp.df.lhs.denorm, tmp.df.lhs.denorm.art), fill=TRUE)