This is a sample dataset.
library(data.table)
set.seed(2)
samplesize<-10;
numeric.id<-as.numeric(sample(CJ(1:27,1:27,1:27)[,paste0(V1,V2,V3)],samplesize,replace=T));
numericandcharmixeddata<-sample(CJ(c( LETTERS, " "),c(1:27),c(letters," "))[,paste0(V1,V2,V3)],samplesize, replace = T);
characterdata<-sample(CJ(LETTERS,letters)[,paste0(LETTERS,letters)],samplesize, replace = T)
colgen<-\(x){sample(get(sample(c("numeric.id","numericandcharmixeddata","characterdata"), 1, replace = T)),x,replace = T)}
colsize<-9;
rowsize<-500
DT<-as.data.table(cbind(sapply(1:colsize, function(x) colgen(rowsize))))
colnames(DT)<-sapply(1:colsize,\(x){paste0("Level",x)});
DT
needs to be filtered based on provided conditions.for e.g. here is a data requirement conditions.
library(openxlsx)
inputtable<-data.table(Level9 = c("K25h",NA,NA),Level3 = c("71627","171721",NA),Level5 = c("Ez","Ai","Bs"))
while importing it to R the blanks were converted to NA.
Something regarding the inputtable
DT
.DT
have. SO the code need to be dynamic or generalized.inputtable
, all possible combinations of items of the Level9
, Level3
& Level5
are required to be filtered from DT
.This should be output if filtered with the conditions as set in inputtable
.
Please ask if anything is confusing or not clear.
I am trying to create a Cartesian join and then match it with DT
.
I would convert your inputtable
to a list
and get rid of the NA
values, then calculate the inclusion of each column individually, and finally filter the data table by those inclusions:
filter_vals = lapply(inputtable, na.omit)
bools = sapply(names(filter_vals), \(x) DT[[x]] %in% filter_vals[[x]])
DT[apply(bools, 1, all), ]
# Level1 Level2 Level3 Level4 Level5 Level6 Level7 Level8 Level9
# <char> <char> <char> <char> <char> <char> <char> <char> <char>
# 1: Jd N22p 71627 M12g Ez Ai M12g 17725 K25h
# 2: Ok M15h 71627 26g Ai Yv M12g 241210 K25h
# 3: Qm M12g 171721 F20z Bs Ai M12g 52024 K25h
# 4: Ok T9c 71627 M15h Bs Ok M12g 71627 K25h
# 5: Jd K25h 171721 T9c Bs Oz N15z 52024 K25h
You could also use that first step for your cross-join, merge approach:
filter_table = do.call(CJ, filter_vals)
merge(DT, filter_table)
# Key: <Level3, Level5, Level9>
# Level3 Level5 Level9 Level1 Level2 Level4 Level6 Level7 Level8
# <char> <char> <char> <char> <char> <char> <char> <char> <char>
# 1: 171721 Bs K25h Qm M12g F20z Ai M12g 52024
# 2: 171721 Bs K25h Jd K25h T9c Oz N15z 52024
# 3: 71627 Ai K25h Ok M15h 26g Yv M12g 241210
# 4: 71627 Bs K25h Ok T9c M15h Ok M12g 71627
# 5: 71627 Ez K25h Jd N22p M12g Ai M12g 17725