In Python, I have trained a flaml autoML (at the moment I am only interested in training an XGBoost for a classification problem for which the binary target is called y_train
and the set of predictive features is called X_train
) by using this code:
from flaml import AutoML
automl = AutoML()
automl.fit(
X_train,
y_train,
estimator_list=["xgboost"],
task="classification",
metric="roc_auc",
eval_method="cv",
n_splits=3,
time_budget=30,
sample=True,
append_log=True,
log_type="all",
model_history=True,
log_training_metric=True,
verbose=3,
seed=1234,
early_stop=True
)
Then, I have checked some model details and calculated the Gini with this code:
from sklearn.metrics import confusion_matrix, precision_recall_curve, roc_curve, auc, log_loss
print("Best estimator: ",automl.best_estimator)
print("Best HP: ",automl.best_model_for_estimator)
predProba = automl.predict_proba(X_train)
probability = []
for i in range(len(predProba)):
probability.append(0)
probability[i] = predProba[i][1]
[fpr, tpr, thr] = roc_curve(y_train, probability)
Gini = round((auc(fpr, tpr)-0.5)*200,1)
print("autoML Gini: ",str(Gini))
The Gini is 39.7.
So far, so good.
Then, I wanted to insert some constraints into the algorithm. The constraints are:
monotone constraints:
monotone_constraints='(1, -1, 1, 1, -1, 1, 1, -1, -1, 1, -1)'
interaction constraints:
interaction_constraints=[]
So, here is the code I have used to insert the constraints in the autoML:
custom_hp = {
"monotone_constraints" : '(1, -1, 1, 1, -1, 1, 1, -1, -1, 1, -1)',
"interaction_constraints" : []
}
automl = AutoML(**custom_hp)
automl.fit(
X_train,
y_train,
estimator_list=["xgboost"],#"lgbm",
task="classification",
metric="roc_auc",
eval_method="cv",
n_splits=3,
time_budget=30,
sample=True,
append_log=True,
log_type="all",
model_history=True,
log_training_metric=True,
verbose=3,
seed=1234,
early_stop=True
)
I have then checked that that constraints were actually picked up by the autoML:
print("Best HP: ",automl.best_model_for_estimator)
And it looks like it has:
Best HP: <bound method AutoML.best_model_for_estimator of AutoML(append_log=False,
auto_augment=True, custom_hp={}, early_stop=False,
ensemble=False, estimator_list='auto', eval_method='auto',
fit_kwargs_by_estimator={}, hpo_method='auto',
**interaction_constraints=[]**, keep_search_state=False,
learner_selector='sample', log_file_name='', log_training_metric=False,
log_type='better', max_iter=None, mem_thres=4294967296, metric='auto',
metric_constraints=[], min_sample_size=10000, model_history=False,
**monotone_constraints='(1, -1, 1, 1, -1, 1, 1, -1, -1, 1, -1)'**,
n_concurrent_trials=1, n_jobs=-1, n_splits=5, pred_time_limit=inf,
retrain_full=True, sample=True, split_ratio=0.1, split_type='auto', ...)>
However, when I calculate the Gini, it is the same as per the model without constraints:
predProba = automl.predict_proba(X_train)
probability = []
for i in range(len(predProba)):
probability.append(0)
probability[i] = predProba[i][1]
[fpr, tpr, thr] = roc_curve(y_train, probability)
Gini = round((auc(fpr, tpr)-0.5)*200,1)
print("autoML Gini: ",str(Gini))
The Gini is: 39.7
Question. Has anyone experienced this before? How is it possible that whilst the autoML details show that the constraints have been picked up by the autoML and the Gini doesn't change?
I have trained the XGBoost outside of the autoML process to verify that the Gini would change based on whether the constraints were applied or not. And the Gini actually changes, so I'd expect the same results from the autoML.
Can anyone help me please?
The form of your constraints may be off. Per the doc, it should be of the form custom_hp = {'<model name>': {'<parameter name>': {'domain'=<parameter value(s)>} } }
So maybe try
custom_hp = {
'xgboost': {
'monotone_constraints': {
'domain': '(1, -1, 1, 1, -1, 1, 1, -1, -1, 1, -1)'
},
'interaction_constraints': {
'domain': []
}
}
}
Edit: Additionally, it seems can't pass in this argument to the constructor like AutoML(**custom_hp)
. Rather AutoML(custom_hp=custom_hp)
or if you prefer the **
syntax,
automl_settings = {'custom_hp': custom_hp}
automl = AutoML(**automl_settings)