I'm using Vertex AI batch predictions using a custom XGBoost model with Explainable AI using Shapley values.
The explanation part is quite computationally intensive so I've tried to split up the input dataset into chunks and submit 5 batch prediction jobs in parallel. When I do this I receive a "Quota exhausted. Please reach to ai-platform-unified-feedback@google.com for batch prediction quota increase".
I don't understand why I'm hitting the quota. According to the docs there is a limit on the number of concurrent jobs for AutoML models but it doesn't mention custom models.
Is the quota perhaps on the number of instances the batch predictions are running on? I'm using a n1-standard-8 instance for my predictions.
I've tried changing the instance type and launching fewer jobs in parallel but still getting the same error.
After reaching out to Google support regarding this issue, it was explained to me that the quota is based on the number of vCPUs used in the batch prediction job. The formula to calculate this is:
the number of vCPUs in a machine X number of machines ( X 3 if explanations are enabled because a separate node is spun up in this case which requires additional resources)
For example if using 50 e2-standard-4
machines to a run batch prediction with explanations results in 50 * 4 * 3 = 600 vCPUs in total being used.
The default quota for a Google project is 2,200 vCPUs for the europe-west2 region. Moreover, this limit is not visible in the user's Google project, but instead in a hidden project only visible to Google engineers. Thus, it is required to raise a support ticket if you need the quota to be increased.