azureopen-sourcelarge-language-modelazure-machine-learning-servicegemma

Deploy an opensource LLM in Azure ecosystem and create an api endpoint


I am looking to deploy an open-source LLM like Gemma3 4b in the Azure ecosystem. I couldn't find this specific model in the model catalog of Azure Machine Learning Studio. I usually run these models using LM Studio or Ollama on my local PC, but I want to deploy this model and make an endpoint.

Please help me with the best practices, preferably a serverless endpoint.


Solution

  • You're right: Gemma 3 4B (at least for now) is not available in Azure ML's model catalog, but what you can try is register the model and deploy it manually.

    In Azure Machine learning workspace, you register the model by uploading the required files.

    enter image description here

    and you will get below options.

    enter image description here

    Give a name and select the framework, in you case it is PyTorch or TensorFlow.

    Make sure the files format is correct because you need to create scoring script where you load the model from these files and generate text.

    After registering the model, click on deploy and select Real-time endpoint

    enter image description here

    Next, fill the required details.

    enter image description here

    In the Code + environment section you have to upload scoring script, this you need to create manually which loads the model and generates text output.

    You refer this documentation for scoring script, and you can use the code block mentioned here for scoring script with small change that is load the model from local files, when deploy it in azure instance the files will be under AZUREML_MODEL_DIR directory.

    Next, to run above scoring script you create need environment having all dependencies and packages, so you need to create an environment with base docker image.

    In Compute section you can select a compute with GPU and number of instances.

    After creating this deployment, you will get an endpoint where you need make request with authentication you selected while deploying.

    Note: You create scoring script in such a way that accepts the input provided from the request, process it and returns the result.