I have DataFrame in Python Pandas like below:
Input data:
Y - binary target
X1...X5 - predictors
Y | X1 | X2 | X3 | X4 | X5 |
---|---|---|---|---|---|
1 | 111 | 22 | 1 | 0 | 150 |
0 | 12 | 33 | 1 | 0 | 222 |
1 | 150 | 44 | 0 | 0 | 230 |
0 | 270 | 55 | 0 | 1 | 500 |
... | ... | ... | ... | ... | ... |
Requirements: And I need to:
roc_auc_score
list_of_models = []
where will be saved created models and DataFrame with AUC on train and testDesire output:
So, as a result I need to have something like below
Model - position of model in list_of_models
Num_var - number of predictors used in model
AUC_train - roc_auc_score on train dataset
AUC_test - roc_auc_score on test dataset
Model | Num_var | AUC_train | AUC_test |
---|---|---|---|
0 | 5 | 0.887 | 0.884 |
1 | 4 | 0.875 | 0.845 |
2 | 3 | 0.854 | 0.843 |
3 | 2 | 0.965 | 0.928 |
4 | 1 | 0.922 | 0.921 |
My draft: which is wrong because it should be loop through all the variables in such a way that after each iteration a new XGBoost classification model is created and also after each iteration one of the variables is discarded and create next model
X_train, X_test, y_train, y_test = train_test_split(df.drop("Y", axis=1)
, df.Y
, train_size = 0.70
, test_size=0.30
, random_state=1
, stratify = df.Y)
results = []
list_of_models = []
for val in X_train:
model = XGBClassifier()
model.fit(X_train, y_train)
list_of_models.append(model)
preds_train = model.predict(X_train)
preds_test = model.predict(X_test)
preds_prob_train = model.predict_proba(X_train)[:,1]
preds_prob_test = model.predict_proba(X_test)[:,1]
results.append({("AUC_train":round(metrics.roc_auc_score(y_train,preds_prod_test),3),
"AUC_test":round(metrics.roc_auc_score(y_test,preds_prod_test),3})
results = pd.DataFrame(results)
How can I do that in Python ?
You want to make your data narrower during each loop? If I understand this correctly you could do something like this:
results = []
list_of_models = []
for i in X_train.columns:
model = XGBClassifier()
model.fit(X_train, y_train)
list_of_models.append(model)
preds_train = model.predict(X_train)
preds_test = model.predict(X_test)
preds_prob_train = model.predict_proba(X_train)[:,1]
preds_prob_test = model.predict_proba(X_test)[:,1]
results.append({("AUC_train":round(metrics.roc_auc_score(y_train,preds_prod_test),3),
"AUC_test":round(metrics.roc_auc_score(y_test,preds_prod_test),3})
X_train = X_train.drop(i, axis=1)
X_test = X_test.drop(i, axis=1)
results = pd.DataFrame(results)