I'm using databricks. Trying to log a model to MLflow using the Feature Store log_model function:
fs.log_model(
model,
artifact_path="fs_model",
flavor=mlflow.sklearn,
training_set=fs_training_set,
)
the script is running in a workflow on a Job Cluster running 10.4 LTS ML (includes Apache Spark 3.2.1, Scala 2.12).
Here are the logs:
TypeError: join() argument must be str, bytes, or os.PathLike object, not 'dict'
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
62 if __name__ == "__main__":
63 job = ModelTrainJob()
---> 64 job.launch()
/tmp/tmp51ge7k75.py in launch(self)
56 env_vars=self.env_vars,
57 )
---> 58 ModelTrain(cfg).run()
59 _logger.info("ModelTrainJob job finished!")
60
/databricks/python/lib/python3.8/site-packages/customer_churn/objects/model_train.py in run(self)
215 # Log model using Feature Store API
216 _logger.info("Logging model to MLflow using Feature Store API")
--> 217 fs.log_model(
218 model,
219 artifact_path="fs_model",
/databricks/.python_edge_libs/databricks/feature_store/client.py in log_model(self, model, artifact_path, flavor, training_set, registered_model_name, await_registration_for, **kwargs)
2106 # the databricks-feature-store package is not available via conda or pip.
2107 conda_file = raw_mlflow_model.flavors["python_function"][mlflow.pyfunc.ENV]
-> 2108 conda_env = read_yaml(raw_model_path, conda_file)
2109
2110 # Get the pip package string for the databricks-feature-lookup client
/databricks/python/lib/python3.8/site-packages/mlflow/utils/file_utils.py in read_yaml(root, file_name)
210 )
211
--> 212 file_path = os.path.join(root, file_name)
213 if not exists(file_path):
214 raise MissingConfigException("Yaml file '%s' does not exist." % file_path)
/usr/lib/python3.8/posixpath.py in join(a, *p)
88 path += sep + b
89 except (TypeError, AttributeError, BytesWarning):
---> 90 genericpath._check_arg_types('join', a, *p)
91 raise
92 return path
/usr/lib/python3.8/genericpath.py in _check_arg_types(funcname, *args)
150 hasbytes = True
151 else:
--> 152 raise TypeError(f'{funcname}() argument must be str, bytes, or '
153 f'os.PathLike object, not {s.__class__.__name__!r}') from None
154 if hasstr and hasbytes:
TypeError: join() argument must be str, bytes, or os.PathLike object, not 'dict'
I found this post python TypeError: join() argument must be str, bytes, or os.PathLike object, not 'list'
which suggests there is an issues with the
file_path = os.path.join(root, file_name)
But this is a few layers deep in the MLflow code.
I figured out the answer to my question, so I'm going to post in case someone else has the same issue.
The error was cause because I was using Databricks Runtime 10.4 LTS ML.
When I upgraded to 12.1 LTS ML the error when away.