google-cloud-platformgoogle-ai-platformgoogle-cloud-vertex-ai

Vertex AI batch prediction location


When I initiate a batch prediction job on Vertex AI of google cloud, I have to specify a cloud storage bucket location. Suppose I provided the bucket location, 'my_bucket/prediction/', then the prediction files are stored in something like: gs://my_bucket/prediction/prediction-test_model-2022_01_17T01_46_39_898Z, which is a subdirectory within the bucket location I provided. The prediction files are stored within that subdirectory and are named:

prediction.results-00000-of-00002
prediction.results-00001-of-00002

Is there any way to programmatically get the final export location from the batch prediction name, id or any other parameter as shown below in the details of the batch prediction job? enter image description here


Solution

  • Not only with those parameters because and you can run the same job multiple times, new folders based on the execution date will be create, but you can get it from the API using your job id (don't forget to set the credentials by GOOGLE_APPLICATION_CREDENTIALS if you are not running on cloud sdk):

    Get the output directory by the Vertex AI - Batch prediction API by the job ID:

    curl -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) "https://us-central1-aiplatform.googleapis.com/v1/projects/[PROJECT_NAME]/locations/us-central1/batchPredictionJobs/[JOB_ID]"
    

    Output: (Get the value from gcsOutputDirectory )

    {
    ...
       "gcsOutputDirectory": "gs://my_bucket/prediction/prediction-test_model-2022_01_17T01_46_39_898Z"
    ...
    }
    

    EDIT: Getting batchPredictionJobs via Python API:

    from google.cloud import aiplatform
    
    #-------
    def get_batch_prediction_job_sample(
        project: str,
        batch_prediction_job_id: str,
        location: str = "us-central1",
        api_endpoint: str = "us-central1-aiplatform.googleapis.com",
    ):
    
        client_options = {"api_endpoint": api_endpoint}
    
        client = aiplatform.gapic.JobServiceClient(client_options=client_options)
        name = client.batch_prediction_job_path(
            project=project, location=location, batch_prediction_job=batch_prediction_job_id
        )
        response = client.get_batch_prediction_job(name=name)
        print("response:", response)
    #-------
    get_batch_prediction_job_sample("[PROJECT_NAME]","[JOB_ID]","us-central1","us-central1-aiplatform.googleapis.com")
    

    Check details about it here Check the API repository here