I'm trying to follow this quickstart to deploy a model on vertex AI:
https://cloud.google.com/vertex-ai/docs/general/deployment
& I'm at the step where I'm supposed to run:
gcloud ai endpoints deploy-model ENDPOINT_ID\
--region=LOCATION_ID \
--model=MODEL_ID \
--display-name=DEPLOYED_MODEL_NAME \
--min-replica-count=MIN_REPLICA_COUNT \
--max-replica-count=MAX_REPLICA_COUNT \
--traffic-split=0=100
How do I get a model id for a public model, ie one I found in Model Garden?
I tried out Llama 2 from its card page: https://pantheon.corp.google.com/vertex-ai/publishers/meta/model-garden/llama2
& used its model id: publishers/meta/models/llama2
However then I get error:
ERROR: (gcloud.ai.endpoints.deploy-model) There is an error while getting the model information. Please make sure the model 'projects/my-test-project/locations/us-east1/models/publishers/meta/models/llama2' exists.
The error looks like it's trying to read from my project / model registry rather than the public one. Can I force it to read from the public Model Garden one? Or do I need to do some setup like downloading the model to my private Model Registry?
Things I've tried:
It does look like I can deploy the model from Model Garden to my endpoint by clicking the "Deploy" button from the Model Garden card UI - but I am trying to run this specifically using the gcloud CLI.
The model id is indeed expected to be a model in the Vertex AI model registry, not the model id displayed on the Model Garden card. That means I need to store the model myself on a gs bucket & combine it with a container.
This ended up being several steps, most of which I got from the code within a Jupyter workbook using the "open workbook" button the Model Garden page.
Here were those steps:
Copy the model to my own gs bucket with gsutil -m cp -R gs://vertex-model-garden-public-us-central1/llama2 YOUR_CLOUD_STORAGE_BUCKET_URI
. This gs bucket was on the model garden card page, and the command took a while since the model is 600GB. In retrospect I probably only needed some small amount of this data (the /llama2/specific-model folder).
Uploaded the model to Model Registry with: gcloud ai models upload --container-image-uri="us-docker.pkg.dev/vertex-ai/vertex-vision-model-garden-dockers/pytorch-vllm-serve:20240222_0916_RC00" --display-name=my-model-display-name --artifact-uri='YOUR_CLOUD_STORAGE_BUCKET_URI/llama2/llama2-7b-chat-hf' --project=your-project --region=us-east1
Listed the model with gcloud ai models list --region=us-east1 --project=your-project --filter=display_name=my-model-display-name
to get the MODEL_ID
which was a random sequence of numbers.
Now I'm able to run the gcloud ai endpoints deploy-model ENDPOINT_ID\ --region=LOCATION_ID \ --model=MODEL_ID \ --display-name=DEPLOYED_MODEL_NAME \ --min-replica-count=MIN_REPLICA_COUNT \ --max-replica-count=MAX_REPLICA_COUNT \ --traffic-split=0=100
command & it seems to be working.