pythonrformularpy2oversampling

Creating R's formula using Python


I am writing a program that interacts with R using Python. Basically, I have some R libraries that I want to ingest into my Python code. After downloading rpy2, I define my R functions that I want to use in a separate .R file script.

The R function requires that we pass the formula to it for applying some oversampling technique. Below is the R function that I wrote:

WFRandUnder <- function(target_variable, other, train, rel, thr.rel, C.perc, repl){
    a <- target_variable
    b <- '~'
    form_begin <- paste(a, b, sep=' ')
    fmla <- as.formula(paste(form_begin, paste(other, collapse= "+")))
    undersampled = RandUnderRegress(fmla, train, rel, thr.rel, C.perc, repl)
    return(undersampled)
}

I am passing, from python, the target variable name, as well as a list containing all the other columns' names. As I want it to be as follows: my_target_variable ~ all other columns

However in these line:

a <- target_variable
    b <- '~'
    form_begin <- paste(a, b, sep=' ')
    fmla <- as.formula(paste(form_begin, paste(other, collapse= "+"))) 

The formula does not always get formulated if I have many columns in my data. What should I do to make it always work? I am concatenating all columns'names with a + operator.


Solution

  • Thanks to @nicola, I was able to solve this problem by doing the following:

    create_formula <- function(target_variable, other){
        # y <- target_variable
        # tilda <- '~'
        # form_begin <- paste(y, tilda, sep=' ')
        # fmla <- as.formula(paste(form_begin, paste(other, collapse= "+")))
        # return(fmla)
        y <- target_variable
        fmla = as.formula(paste(y, '~ .'))
        return(fmla)
    }
    

    I call this function from my python program using rpy2. This issues no problem because whenever we use this formula, we will be attaching the data itself to it, so it won't possess a problem. A sample code to demonstrate what I'm saying:

            if self.smogn:
                smogned = runit.WFDIBS(
    
                     # here is the formula call (get_formula is a python function that calls create_formula defined above in R)
                    fmla=get_formula(self.target_variable, self.other),
    
                    # here is the data 
                    dat=df_combined,
    
                    method=self.phi_params['method'][0],
                    npts=self.phi_params['npts'][0],
                    controlpts=self.phi_params['control.pts'],
                    thrrel=self.thr_rel,
                    Cperc=self.Cperc,
                    k=self.k,
                    repl=self.repl,
                    dist=self.dist,
                    p=self.p,
                    pert=self.pert)