I built a machine learning model:
lr = LinearRegression()
lr.fit(X_train, y_train)
which I can save to the filestore by:
filename = "/dbfs/FileStore/lr_model.pkl"
with open(filename, 'wb') as f:
pickle.dump(lr, f)
Ideally, I wanted to save the model directly to a workspace or a repo so I tried:
filename = "/Users/user/lr_model.pkl"
os.makedirs(os.path.dirname(filename), exist_ok=True)
with open(filename, 'wb') as f:
pickle.dump(lr, f)
but it is not working because the file is not showing up in the workspace.
The only alternative I have now is to transfer the model from the filestore to the workspace or a repo, how do I go about that?
When you store file in DBFS (/FileStore/...
), it's in your account (data plane). While notebooks, etc. are in the Databricks account (control plane). By design, you can't import non-code objects into a workspace. But Repos now has support for arbitrary files, although only one direction - you can access files in Repos from your cluster running in data plane, but you can't write into Repos (at least not now). You can:
But really, you should use MLflow that is built-in into Azure Databricks, and it will help you by logging the model file, hyper-parameters, and other information. And then you can work with this model using APIs, command tools, etc., for example, to move the model between staging & production stages using Model Registry, deploy model to AzureML, etc.