I am writing a program that interacts with R using Python. Basically, I have some R libraries that I want to ingest into my Python code. After downloading rpy2
, I define my R functions that I want to use in a separate .R
file script.
The R function requires that we pass the formula to it for applying some oversampling
technique. Below is the R function that I wrote:
WFRandUnder <- function(target_variable, other, train, rel, thr.rel, C.perc, repl){
a <- target_variable
b <- '~'
form_begin <- paste(a, b, sep=' ')
fmla <- as.formula(paste(form_begin, paste(other, collapse= "+")))
undersampled = RandUnderRegress(fmla, train, rel, thr.rel, C.perc, repl)
return(undersampled)
}
I am passing, from python, the target variable name, as well as a list containing all the other columns' names. As I want it to be as follows:
my_target_variable ~ all other columns
However in these line:
a <- target_variable
b <- '~'
form_begin <- paste(a, b, sep=' ')
fmla <- as.formula(paste(form_begin, paste(other, collapse= "+")))
The formula does not always get formulated if I have many columns in my data. What should I do to make it always work? I am concatenating all columns'names with a +
operator.
Thanks to @nicola, I was able to solve this problem by doing the following:
create_formula <- function(target_variable, other){
# y <- target_variable
# tilda <- '~'
# form_begin <- paste(y, tilda, sep=' ')
# fmla <- as.formula(paste(form_begin, paste(other, collapse= "+")))
# return(fmla)
y <- target_variable
fmla = as.formula(paste(y, '~ .'))
return(fmla)
}
I call this function from my python program using rpy2
. This issues no problem because whenever we use this formula, we will be attaching the data itself to it, so it won't possess a problem. A sample code to demonstrate what I'm saying:
if self.smogn:
smogned = runit.WFDIBS(
# here is the formula call (get_formula is a python function that calls create_formula defined above in R)
fmla=get_formula(self.target_variable, self.other),
# here is the data
dat=df_combined,
method=self.phi_params['method'][0],
npts=self.phi_params['npts'][0],
controlpts=self.phi_params['control.pts'],
thrrel=self.thr_rel,
Cperc=self.Cperc,
k=self.k,
repl=self.repl,
dist=self.dist,
p=self.p,
pert=self.pert)