I got this Error when serving a model into databricks using MLflow,
Unrecognized content type parameters: format. IMPORTANT: The MLflow Model scoring protocol has changed in MLflow version 2.0. If you are seeing this error, you are likely using an outdated scoring request format. To resolve the error, either update your request format or adjust your MLflow Model's requirements file to specify an older version of MLflow (for example, change the 'mlflow' requirement specifier to 'mlflow==1.30.0'). If you are making a request using the MLflow client (e.g. via mlflow.pyfunc.spark_udf()), upgrade your MLflow client to a version >= 2.0 in order to use the new request format. For more information about the updated MLflow Model scoring protocol in MLflow 2.0, see https://mlflow.org/docs/latest/models.html#deploy-mlflow-models.
I'm looking after the right format to use on my Json input, as the format I am using looks like this example :
[
{
"input1":12,
"input2":290.0,
"input3":'red'
}
]
I don't really know if it's related to a version of my mlfow (currently I'm using mlflow==1.24.0
), I can not update the version as I do not have some privileges.
I also have tried the solution suggested here and got :
TypeError:spark_udf() got an unexpected keyword argument 'env_manager'
I do not find any documentation so far to solve this issue.
Thank you for your help, in advance.
When you are logging the model, your MLflow version is 1.24, but when you serve it as an API in Databrick's there will be a new environment created for it. This new environment is installing a 2.0+ version of MLflow. As the error message suggests, you can either specify the MLflow version or update the request format.
If you are using Classic Model Serving, you should specify the version, if you are using Serverless Model Serving, you should update the request format. If you must use Classic Model Serving and do not want to upgrade, scroll to the bottom.
When logging the model, you can specify a new Conda environment or add additional pip requirements that are used when the model is being served.
pip
# log model with mlflow==1.* specified
mlflow.<flavor>.log_model(..., extra_pip_requirements=["mlflow==1.*"])
Conda
# get default conda env
conda_env = mlflow.<flavor>.get_default_conda_env()
print(conda_env)
# specify mlflow==1.*
conda_env = {
"channels": ["conda-forge"],
"dependencies": [
"python=3.9.5",
"pip<=21.2.4",
{"pip": ["mlflow==1.*", "cloudpickle==2.0.0"]},
],
"name": "mlflow-env",
}
# log model with new conda_env
mlflow.<flavor>.log_model(..., conda_env=conda_env)
An alternative is to update the JSON request format, but this only will work if you are using Databrick's Serverless.
In the MLflow docs link at the end of the error message, you can see all the formats. From the data, you provided, I would suggest using dataframe_split
or dataframe_records
.
{
"dataframe_split": {
"columns": ["input1", "input2", "input3"],
"data": [[1, 2, "red"]]
}
}
{
"dataframe_records": [
{
"input1": 12,
"input2": 290,
"input3": "red"
}
]
}
If you are using Classic Model Serving, don't want to specify the MLflow version and want to use the UI for inference, DO NOT log an input_example
when you log the model. I know this does not follow "best practice" for MLflow, but because of some investigating, I believe there is an issue with Databricks when you do this.
When you log an input_example
, MLFlow logs information about the example including type
and pandas_orient
. This information is used to generate the inference recipe. As you can see in the generated curl command, it sets format=pandas-records
(the JSON is not generated). But this returns the Unrecognized content type... error.
curl \
-u token:$DATABRICKS_TOKEN \
-X POST \
-H "Content-Type: application/json; format=pandas-records" \
-d '{
"dataframe_split": {
"columns": ["input1", "input2", "input3"],
"data": [[12, 290, 3]]
}
}' \
https://<url>/model/<model>/<version>/invocations
For me when I removed format=pandas-records
entirely, then everything works as expected. Because of this, I believe if you log an example and use the UI then Databricks is adding this format to the request for you. Which results in an error even if you did everything correctly. While in serverless the generated curl does not include this parameter at all.