google-cloud-vertex-ai

What's the model id for a Model Garden model when deploying using Vertex AI?


I'm trying to follow this quickstart to deploy a model on vertex AI:

https://cloud.google.com/vertex-ai/docs/general/deployment

& I'm at the step where I'm supposed to run:

gcloud ai endpoints deploy-model ENDPOINT_ID\
  --region=LOCATION_ID \
  --model=MODEL_ID \
  --display-name=DEPLOYED_MODEL_NAME \
  --min-replica-count=MIN_REPLICA_COUNT \
  --max-replica-count=MAX_REPLICA_COUNT \
  --traffic-split=0=100

How do I get a model id for a public model, ie one I found in Model Garden?

I tried out Llama 2 from its card page: https://pantheon.corp.google.com/vertex-ai/publishers/meta/model-garden/llama2

& used its model id: publishers/meta/models/llama2

However then I get error:

ERROR: (gcloud.ai.endpoints.deploy-model) There is an error while getting the model information. Please make sure the model 'projects/my-test-project/locations/us-east1/models/publishers/meta/models/llama2' exists.

The error looks like it's trying to read from my project / model registry rather than the public one. Can I force it to read from the public Model Garden one? Or do I need to do some setup like downloading the model to my private Model Registry?

Things I've tried:

It does look like I can deploy the model from Model Garden to my endpoint by clicking the "Deploy" button from the Model Garden card UI - but I am trying to run this specifically using the gcloud CLI.


Solution

  • The model id is indeed expected to be a model in the Vertex AI model registry, not the model id displayed on the Model Garden card. That means I need to store the model myself on a gs bucket & combine it with a container.

    This ended up being several steps, most of which I got from the code within a Jupyter workbook using the "open workbook" button the Model Garden page.

    Here were those steps:

    1. Copy the model to my own gs bucket with gsutil -m cp -R gs://vertex-model-garden-public-us-central1/llama2 YOUR_CLOUD_STORAGE_BUCKET_URI. This gs bucket was on the model garden card page, and the command took a while since the model is 600GB. In retrospect I probably only needed some small amount of this data (the /llama2/specific-model folder).

    2. Uploaded the model to Model Registry with: gcloud ai models upload --container-image-uri="us-docker.pkg.dev/vertex-ai/vertex-vision-model-garden-dockers/pytorch-vllm-serve:20240222_0916_RC00" --display-name=my-model-display-name --artifact-uri='YOUR_CLOUD_STORAGE_BUCKET_URI/llama2/llama2-7b-chat-hf' --project=your-project --region=us-east1

    3. Listed the model with gcloud ai models list --region=us-east1 --project=your-project --filter=display_name=my-model-display-name to get the MODEL_ID which was a random sequence of numbers.

    4. Now I'm able to run the gcloud ai endpoints deploy-model ENDPOINT_ID\ --region=LOCATION_ID \ --model=MODEL_ID \ --display-name=DEPLOYED_MODEL_NAME \ --min-replica-count=MIN_REPLICA_COUNT \ --max-replica-count=MAX_REPLICA_COUNT \ --traffic-split=0=100 command & it seems to be working.