pythonfor-loopmachine-learningrocauc

How to create a few Machine Learning models through all variables and after each iteration next XGBClassifier is created with 1 less var in Python?


I have DataFrame in Python Pandas like below:

Input data:

Requirements: And I need to:

Desire output:

So, as a result I need to have something like below

My draft: which is wrong because it should be loop through all the variables in such a way that after each iteration a new XGBoost classification model is created and also after each iteration one of the variables is discarded and create next model

X_train, X_test, y_train, y_test = train_test_split(df.drop("Y", axis=1)
                                                    , df.Y
                                                    , train_size = 0.70
                                                    , test_size=0.30
                                                    , random_state=1
                                                    , stratify = df.Y)

results = []
list_of_models = []

for val in X_train:

    model = XGBClassifier()
    model.fit(X_train, y_train)
    list_of_models.append(model)

    preds_train = model.predict(X_train)
    preds_test = model.predict(X_test)
    preds_prob_train = model.predict_proba(X_train)[:,1]
    preds_prob_test = model.predict_proba(X_test)[:,1]

    results.append({("AUC_train":round(metrics.roc_auc_score(y_train,preds_prod_test),3),
                     "AUC_test":round(metrics.roc_auc_score(y_test,preds_prod_test),3})

results = pd.DataFrame(results)

How can I do that in Python ?


Solution

  • You want to make your data narrower during each loop? If I understand this correctly you could do something like this:

    results = []
    list_of_models = []
    
    for i in X_train.columns:
        model = XGBClassifier()
        model.fit(X_train, y_train)
        list_of_models.append(model)
    
        preds_train = model.predict(X_train)
        preds_test = model.predict(X_test)
        preds_prob_train = model.predict_proba(X_train)[:,1]
        preds_prob_test = model.predict_proba(X_test)[:,1]
        results.append({("AUC_train":round(metrics.roc_auc_score(y_train,preds_prod_test),3),
                     "AUC_test":round(metrics.roc_auc_score(y_test,preds_prod_test),3})
        X_train = X_train.drop(i, axis=1)
        X_test = X_test.drop(i, axis=1)
    
    results = pd.DataFrame(results)