google-cloud-platformgoogle-cloud-vertex-aiimage-classification

Batch Prediction for (Zero-Shot) Image Classification Model on GCP Vertex AI


I'm working on implementing an image classification model (specifically using one of the provided Model Garden models CLIP) hosted on Google Cloud Vertex AI. Following the included Jupyter Notebook I was able to upload and deploy the model and perform online predictions with it. However, I'm facing issues when trying to convert the online prediction to a batch prediction / just performing a batch prediction on an image using this model.

Inside the Jupyter notebook this is the code for the online prediction which consists of JPG images downloaded off the internet, converted to B64 and then formatted into an instances array each consisting of an object with an image field and a text field (for the zero shot classification label).

def image_to_base64(image, format="JPEG"):
    buffer = BytesIO()
    image.save(buffer, format=format)
    image_str = base64.b64encode(buffer.getvalue()).decode("utf-8")
    return image_str

instances = [
    {"image": image_to_base64(image1), "text": "two cats"},
    {"image": image_to_base64(image2), "text": "a bear"},
]

preds = endpoint.predict(instances=instances).predictions

Using the GCP Documentation for performing a batch prediction I made a JSON Lines file using a JPG image file converted to B64 that mimics the instance formatting for the online prediction.

batch_predict.jsonl

{"image": "<B64_OF_JPG_IMAGE>", "text": "rack"}

I then made a request at the uploaded model (I deleted the endpoint used in the Jupyter Notebook since batch predictions do not require the model to be deployed only uploaded to Model Registry) using the following code (lifted from the documentation)

model.batch_predict(
  job_display_name='test-batch-prediction-job',
  instances_format='jsonl',
  machine_type='n1-standard-8',
  accelerator_type="NVIDIA_TESLA_T4",
  accelerator_count=1,
  gcs_source='gs://' + GCS_INPUT_BUCKET + '/batch_predict.jsonl',
  gcs_destination_prefix='gs://' + GCS_BUCKET,
  service_account=SERVICE_ACCOUNT
)

After the batch prediction job finishes there are 2 files in the output folder, the prediction results file downloads blank which I expect is supposed to look something like the example response from a different doc

{
  "instance": {"content": "gs://bucket/image.jpg", "mimeType": "image/jpeg"},
  "prediction": {
    "ids": [1, 2],
    "displayNames": ["cat", "dog"],
    "confidences": [0.7, 0.5]
  }
}

the errors file contains the following message

('Post request fails. Cannot get predictions. Error: Exceeded retries: Non-OK result 503 ({\n "code": 503,\n "type": "InternalServerException",\n "message": "Prediction failed"\n}\n) from server, retry=3, ellapsed=0.07s.', 1)

Which I can't glean much from, the logs similarly do not provide much to act on.

The GCP documentation for JSONL files also mentions that there is a slight formatting difference for PyTorch prebuilt containers (I believe this model counts and I tried both ways) which entails setting the object as the value for a "data" property

batch_predict.jsonl

{"data": {"image": "<B64_OF_JPG_IMAGE>", "text": "rack"}}

This also did not work. I've also tried tweaking assorted other knobs such as changing the "image" property to "b64", switching the B64 for a link to the JPG in a GCloud Storage bucket, making the batch prediction from the GCloud Console and fiddling with the permissions granted to the service account all of which results in similar errors.

Is there something wrong with the way I'm formatting my JSONL file for batch predictions?

Could this error be related to the way the model is set up for batch processing on Vertex AI?

Could this be a problem with the model (CLIP) being unsuitable for batch predictions?

Are there specific settings or configurations I should check on GCP Vertex AI for batch predictions with this kind of model?


Solution

  • Having gone through extensive trial and error in order to make the Batch Prediction work for the ML Image models provided on GCP Vertex AI's Model Garden I think I've pinned down a few key things to fiddle with if it isn't working:

    New Batch Prediction Form Service Account Selector in Advanced Options