google-cloud-platformgoogle-cloud-automlgoogle-cloud-vertex-ai

Google Vertex AI fails AutoML training due to large BigQuery dataset being too large


I am currently training some models via Googles AutoML feature contained within their Vertex AI products.

The normal pipeline is creating a dataset, which I do by creating a table in Bigquery, and then starting the training process.

This has normally worked before but for my latest dataset I get the following error message:

Training pipeline failed with error message: The size of source BigQuery table is larger than 107374182400 bytes.

While it seemed unlikely to me that the table is actually too large for AutoML, I tried re-training on a new dataset that's a 50% sample of the original table but the same error occured.

Is my dataset really to large for AutoML to handle or is there another issue?


Solution

  • There are some perspectives of limits for AutoML Tables -- not only size in bytes (100GB as maximum supported size), but also number of rows (~200bi lines) and number of columns (up to 1000 columns).

    You can find more details on AutoML Tables limits documentation.

    Is your source data within those limits?