pythonscikit-learncross-validationpycaret

Custom cross-validation and pycaret


I've been working with hierarchical time series, and, as a result, I needed to create my own CV to make sure that all timestamps and products are included evenly in the test (validation) set. It worked just fine for sklearn, but I can't make it work in pycaret: best = compare_models() yields nothing at all. Here is the custom CV I used:

class custom_cv:
    def __init__(self, train_end, test_size, n_splits): # val_size
        self.train_end = train_end
        # self.val_size = val_size
        self.test_size = test_size
        self.n_splits = n_splits
        
    def split(self, X):
        self.X = X
        
        for i in range(self.n_splits, 0, -1): # range(start, stop, step)
            tr_threshol = self.train_end - self.test_size*i
            te_threshol = tr_threshol + self.test_size
    
            tr_idx = np.array(self.X.reset_index(drop = True).index[self.X['N_month'] <= tr_threshol])
            te_idx = np.array(self.X.index[(self.X['N_month'] > tr_threshol) & (self.X['N_month'] <= te_threshol)])
        
            yield(tr_idx, te_idx)

    custom_CV = custom_cv(train_end = 365, test_size = 28, n_splits = 5)
    # custom_CV = custom_CV.split(X = df)

My Data looks like this: 1

For sklearn I used the following loop:

def custom_cv(df, train_end = 36, test_size = 4, n_splits = 4):
    cv_idx = []

    for i in range(n_splits, 0, -1): # range(start, stop, step)
        tr_threshol = train_end - test_size*i
        te_threshol = tr_threshol + test_size
    
        tr_idx = list(df.reset_index(drop = True).index[df['N_month'] <= tr_threshol])
        te_idx = list(df.index[(df['N_month'] > tr_threshol) & (df['N_month'] <= te_threshol)])
    
        cv_idx.append((tr_idx, te_idx))
    
    return cv_idx

custom_CV = custom_cv(df = df, train_end = 365, test_size = 28, n_splits = 5)

However, pycaret requires a custom CV generator object compatible with scikit-learn (something I've never dealt with before). I can't figure out what's wrong exactly, and I hope you can kindly help me out.


Solution

  • Your class for Pycaret is probably missing the get_n_splits method.I had similar problem and solved with the class structure like here:

    How to generate a custom cross-validation generator in scikit-learn?