pysparkdatabricksmlflowfeature-store

Logging model to MLflow using Feature Store API. Getting TypeError: join() argument must be str, bytes, or os.PathLike object, not 'dict'


I'm using databricks. Trying to log a model to MLflow using the Feature Store log_model function:

fs.log_model(
                model,
                artifact_path="fs_model",
                flavor=mlflow.sklearn,
                training_set=fs_training_set,

)

the script is running in a workflow on a Job Cluster running 10.4 LTS ML (includes Apache Spark 3.2.1, Scala 2.12).

Here are the logs:

TypeError: join() argument must be str, bytes, or os.PathLike object, not 'dict'
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
     62 if __name__ == "__main__":
     63     job = ModelTrainJob()
---> 64     job.launch()

/tmp/tmp51ge7k75.py in launch(self)
     56             env_vars=self.env_vars,
     57         )
---> 58         ModelTrain(cfg).run()
     59         _logger.info("ModelTrainJob job finished!")
     60 

/databricks/python/lib/python3.8/site-packages/customer_churn/objects/model_train.py in run(self)
    215             # Log model using Feature Store API
    216             _logger.info("Logging model to MLflow using Feature Store API")
--> 217             fs.log_model(
    218                 model,
    219                 artifact_path="fs_model",

/databricks/.python_edge_libs/databricks/feature_store/client.py in log_model(self, model, artifact_path, flavor, training_set, registered_model_name, await_registration_for, **kwargs)
   2106             # the databricks-feature-store package is not available via conda or pip.
   2107             conda_file = raw_mlflow_model.flavors["python_function"][mlflow.pyfunc.ENV]
-> 2108             conda_env = read_yaml(raw_model_path, conda_file)
   2109 
   2110             # Get the pip package string for the databricks-feature-lookup client

/databricks/python/lib/python3.8/site-packages/mlflow/utils/file_utils.py in read_yaml(root, file_name)
    210         )
    211 
--> 212     file_path = os.path.join(root, file_name)
    213     if not exists(file_path):
    214         raise MissingConfigException("Yaml file '%s' does not exist." % file_path)

/usr/lib/python3.8/posixpath.py in join(a, *p)
     88                 path += sep + b
     89     except (TypeError, AttributeError, BytesWarning):
---> 90         genericpath._check_arg_types('join', a, *p)
     91         raise
     92     return path

/usr/lib/python3.8/genericpath.py in _check_arg_types(funcname, *args)
    150             hasbytes = True
    151         else:
--> 152             raise TypeError(f'{funcname}() argument must be str, bytes, or '
    153                             f'os.PathLike object, not {s.__class__.__name__!r}') from None
    154     if hasstr and hasbytes:

TypeError: join() argument must be str, bytes, or os.PathLike object, not 'dict'

I found this post python TypeError: join() argument must be str, bytes, or os.PathLike object, not 'list'

which suggests there is an issues with the

file_path = os.path.join(root, file_name)

But this is a few layers deep in the MLflow code.


Solution

  • I figured out the answer to my question, so I'm going to post in case someone else has the same issue.

    The error was cause because I was using Databricks Runtime 10.4 LTS ML.

    When I upgraded to 12.1 LTS ML the error when away.