azure-machine-learning-serviceazureml-python-sdk

How do I create a DatatSet with Data Type: ModelDirectory in Azure Machine Learning Studio?


I'm attempting to manually create a DataSet with Data Type: ModelDirectory in Azure Machine Learning Studio, in order to use it in an Inference Pipeline. I have taken an existing ModelDirectory DataSet and attempted to replicate it. Everything is identical, except that the replica has Data Type: AnyDirectory, and can not be hooked up to the input of a ScoreModel node in the designer. How can I (manually in the UI or, better yet, programmatically) create a DataSet with Data Type: ModelDirectory from the output files of a trained model?

Existing DataSet: enter image description here Existing DataSet outputs: enter image description here

Manually Created Replica DataSet: enter image description here Manually Created Replica DataSet outputs: enter image description here

As you can see, the outputs of both DataSets are identical. The only difference between the two DataSets, seems to be the 'Data Type' properties, although in the output view, you can see that both have 'type: ModelDirectory'.


Solution

  • I just spent a few hours on this exact problem. To save anyone else the wasted time: it looks like you cannot create a dataset with ModelDirectory type in Azure ML SDK V2.

    On Microsoft's support site, another user had this exact problem, and filed a support ticket (thread here - https://learn.microsoft.com/en-us/answers/questions/398468/how-do-i-create-a-modeldirectory-type-filedataset). Microsoft directly told them that this is not something exposed to users.

    Reproducing the post here in case the link goes down (emphasis mine):

    I managed to take this all the way to the Microsoft product development team, and the answer (paraphrased) was as follows. This ModelDirectory class is 100% abstracted from the users, and thus cannot be called from with a user defined block.

    The group told me that the designer is not meant to be a production grade tool in it's current implemention - more of a prototyping tool - and it is not compatible with CI/CD processes like python scripts are. Automated retraining in the designer is not a supported scenario as of today either, since we can't create an API for the training pipeline. I can't speak to the roadmap though, but I did advocate that some method of enabling automation and MLOps in the designer would be highly desirable.

    So I don't think you're missing anything. In the end we had to work with the team to convert the prototyped pipelines into python equivalents.

    Which is a real shame, because Azure ML is otherwise a pretty okay system. Ultimately what I did was what the post's author did: I wrote my own versions of the nodes I wanted to use.