google-cloud-platformgoogle-cloud-mlgoogle-cloud-automlgoogle-cloud-vertex-ai

Vertex AI model batch prediction failed with internal error


I have trained the AutoMl classification model on Vertex AI, unfortunately model does not work with batch predictions, whenever I try to score training dataset (same which was used for the successful model training) with batch predictions on Vertex AI I get a following error:

"Due to one or more errors, this training job was canceled on Nov 11, 2021 at 09:42AM".

There is an option to get a details from this error and those say the following thing:

"Batch prediction job customer_value_label_cv_automl_gui encountered the following errors: INTERNAL"

Does anyone know what might be the reason for getting this kind of error? I am very surprised that the model cannot score the dataset that it was trained on. My dataset consists of 570 columns and about 300k of records. enter image description here

enter image description here


Solution

  • We have been able to finally figure this out. As we were using model.batch_predict method described in the official documentation we unnecessary set the machine_type parameter. Finally, we were able to figure out that it was causing the issue, the machine was probably too weak. Once we removed this declaration this method started to use automatic resources and that solved the case. I wish Vertex AI errors were a little bit more informative because it took us a lot of trials and error to figure out.