scikit-learnsavexgboostmlflowxgbclassifier

Convert an instance of xgboost.Booster into a model that implements the scikit-learn API


I am trying to use mlflow to save a model and then load it later to make predictions.

I'm using a xgboost.XGBRegressor model and its sklearn functions .predict() and .predict_proba() to make predictions but it turns out that mlflow doesn't support models that implements the sklearn API, so when loading the model later from mlflow, mlflow returns an instance of xgboost.Booster, and it doesn't implements the .predict() or .predict_proba() functions.

Is there a way to convert a xgboost.Booster back into a xgboost.sklearn.XGBRegressor object that implements the sklearn API functions?


Solution

  • Have you tried wrapping up your model in custom class, logging and loading it using mlflow.pyfunc.PythonModel? I put up a simple example and upon loading back the model it correctly shows <class 'xgboost.sklearn.XGBRegressor'> as a type.

    Example:

    import xgboost as xgb
    xg_reg = xgb.XGBRegressor(...)
    
    class CustomModel(mlflow.pyfunc.PythonModel):
        def __init__(self, xgbRegressor):
            self.xgbRegressor = xgbRegressor
    
        def predict(self, context, input_data):
            print(type(self.xgbRegressor))
            
            return self.xgbRegressor.predict(input_data)
    
    # Log model to local directory
    with mlflow.start_run():
         custom_model = CustomModel(xg_reg)
         mlflow.pyfunc.log_model("custome_model", python_model=custom_model)
    
    
    # Load model back
    from mlflow.pyfunc import load_model
    model = load_model("/mlruns/0/../artifacts/custome_model")
    model.predict(X_test)
    

    Output:

    <class 'xgboost.sklearn.XGBRegressor'>
    [ 9.107417 ]