i esttablished a function of optuna to find out best model of gbm and xgboost for my data but i was wondering if i can take the best model and apply it directly into my notebook(extracting best model as an object to reuse it later) here is my objective function:
import lightgbm as lgb
import optuna
import sklearn.metrics
from xgboost import XGBRegressor
from optuna.integration import XGBoostPruningCallback
from sklearn.ensemble import HistGradientBoostingRegressor
from sklearn.model_selection import train_test_split
from sklearn import datasets
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error
best_booster = None
gbm = None
def objective(trial,random_state=22,n_jobs=1,early_stopping_rounds=50):
regrosser_name = trial.suggest_categorical("regressor", ["XGBoost", "lightgbm"])
train_x, valid_x, train_y, valid_y = train_test_split(X_train, y_train, test_size=0.25)
dtrain = lgb.Dataset(train_x, label=train_y)
# Step 2. Setup values for the hyperparameters:
if regrosser_name == 'XGBoost':
params = {
"verbosity": 0, # 0 (silent) - 3 (debug)
"objective": "reg:squarederror",
"n_estimators": 10000,
"max_depth": trial.suggest_int("max_depth", 4, 12),
"learning_rate": trial.suggest_loguniform("learning_rate", 0.005, 0.05),
"colsample_bytree": trial.suggest_loguniform("colsample_bytree", 0.2, 0.6),
"subsample": trial.suggest_loguniform("subsample", 0.4, 0.8),
"alpha": trial.suggest_loguniform("alpha", 0.01, 10.0),
"lambda": trial.suggest_loguniform("lambda", 1e-8, 10.0),
"gamma": trial.suggest_loguniform("lambda", 1e-8, 10.0),
"min_child_weight": trial.suggest_loguniform("min_child_weight", 10, 1000),
"seed": random_state,
"n_jobs": n_jobs,
}
model = XGBRegressor(**params)
model.fit(train_x, train_y)
y_pred = model.predict(X_val)
accuracy_rf = sklearn.metrics.mean_absolute_error(valid_y, y_pred)
return accuracy_rf
print(rf_max_depth)
print(rf_n_estimators)
else:
param = {
"objective": "binary",
"metric": "binary_logloss",
"verbosity": -1,
"boosting_type": "gbdt",
"lambda_l1": trial.suggest_float("lambda_l1", 1e-8, 10.0, log=True),
"lambda_l2": trial.suggest_float("lambda_l2", 1e-8, 10.0, log=True),
"num_leaves": trial.suggest_int("num_leaves", 2, 256),
"feature_fraction": trial.suggest_float("feature_fraction", 0.4, 1.0),
"bagging_fraction": trial.suggest_float("bagging_fraction", 0.4, 1.0),
"bagging_freq": trial.suggest_int("bagging_freq", 1, 7),
"min_child_samples": trial.suggest_int("min_child_samples", 5, 100),
}
gbm = lgb.train(param, dtrain)
preds_gbm = gbm.predict(valid_x)
pred_labels_gbm = np.rint(preds_gbm)
accuracy_gbm = sklearn.metrics.mean_absolute_error(valid_y, pred_labels_gbm)
return accuracy_gbm
and here is how i tried to solve this issue:
def callback(study, trial):
global best_booster
if study.best_trial == trial:
best_booster = gbm
if __name__ == "__main__":
study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=100, callbacks=[callback])
i think its about importing somthing, and if there is any tips on my optuna function please state it
If I understood your question correctly, then yes, that's what models are for.
Like bring your saved model to your notebook, feed it data that has the same structure as what you used to train it, and it should serve its purpose. Or use it in a pipeline.
Even 1 line of the same structure as an np array can be used. For example, my model predicts whether a loan should be approved or not.
For example, a bank customer wants a loan and submits his information. The bank officer inputs this info in the system. The system transforms this information into a single np array with the same structure as the dataset used to train the model.
The model is then used by the system to predict whether the loan should be approved or not.
I save my optuna xgb models as json, e.g.
my_model.get_booster().save_model(f'{savepath}my_model.json')