pythonclasspipelinesmote

Sklearn Pipeline - Customized 'Optional Estimator'


I have created this function below, that creates a pipeline and returns it.

def make_final_pipeline(columns_transformer, onehotencoder, estimator,
Name_of_estimator, index_of_categorical_features, use_smote=True):
    if use_smote:
        # Final pipeline with the SMOTE-NC and the estimator.
        finalPipeline = ImblearnPipeline(
            steps=[('col_transformer', columns_transformer),
                   ('smote', SMOTENC(categorical_features=index_of_categorical_features, 
sampling_strategy='auto')),
                   ('oneHotColumnEncoder', onehotencoder),
                   (Name_of_estimator, estimator)
                  ]
        )
    else:
        # Final pipeline with the estimator only.
        finalPipeline = ImblearnPipeline(
            steps=[('col_transformer', columns_transformer),
                   ('oneHotColumnEncoder', onehotencoder),
                   (Name_of_estimator, estimator)
                  ]
        )
    
    return finalPipeline

In the returned Pipeline, the SMOTENC step becomes optional thanks to use_smote. However, according to [this question],(Is it possible to toggle a certain step in sklearn pipeline?), it is possible to create a customized OptionalSMOTENC that would take all arguments of SMOTENC as well as use_smote, and would be so that make_final_pipeline could be written as:

def make_final_pipeline(columns_transformer, onehotencoder, estimator,
 Name_of_estimator, index_of_categorical_features, use_smote=True):

    # Final pipeline with the optional SMOTE-NC and the estimator.
    finalPipeline = ImblearnPipeline(
        steps=[('col_transformer', columns_transformer),
               ('smote', OptionalSMOTENC(categorical_features=index_of_categorical_features,
 sampling_strategy='auto', use_smote=use_smote)),
               ('oneHotColumnEncoder', onehotencoder),
               (Name_of_estimator, estimator)
              ]
    )
    return finalPipeline

I guess that the OptionalSMOTENC should be like this:

class OptionalSMOTENC(SMOTENC):
    
    def __init__(categorical_features, sampling_strategy='auto', use_smote=True):
        super().__init__()
        self.categorical_features = categorical_features
        self.sampling_strategy = sampling_strategy
        self.smote = smote
    
    def fit(self, X, y = None):
        if self.smote:
            # fit smotenc
        else:
            # do nothing
    def fit_resample(self, X, y = None):
        if self.smote:
            # fit_resample smotenc
        else:
            # do nothing

But I do not know how to correctly write it: can I write class OptionalSMOTENC(SMOTENC) or should I just write class OptionalSMOTENC() ? Did I put super().__init__() at a right place?

To conclude, I am not familiar with the way to write such an estimator, could you help me?


Solution

  • I was finally able to come up with a solution:

    class OptionalSMOTENC(SMOTENC):
        
        def __init__(self, categorical_features, sampling_strategy='auto', 
                     random_state=None, k_neighbors=5, n_jobs=None, use_smote=True):
            super().__init__(categorical_features, sampling_strategy=sampling_strategy, 
                             random_state=random_state, k_neighbors=k_neighbors, n_jobs=n_jobs)
            self.use_smote = use_smote
            
        def fit(self, X, y = None):
            if self.use_smote:
                return SMOTENC.fit(self, X, y)
            else:
                return self
            
        def fit_resample(self, X, y = None):
            if self.use_smote:
                return SMOTENC.fit_resample(self, X, y)
            else:
                return X, y
    

    From my understanding, one could replace SMOTENC by any estimator and create a class like:

    class OptionalEstimator(Estimator):
        
        def __init__(self, arg1, arg2, arg3, use_estimator=True): # Replace arg1, arg2, arg3 by the arguments of Estimator.
            super().__init__(arg1, arg2, arg3)
            self.use_estimator = use_estimator
            
        def fit(self, X, y = None):
            if self.use_estimator:
                return Estimator.fit(self, X, y)
            else:
                return self
            
        def transform(self, X, y = None):
            if self.use_estimator:
                return Estimator.transform(self, X, y)
            else:
                return X, y