google-cloud-platformgoogle-cloud-vertex-aivertex-ai-pipeline

Vertex AI Data Store indexing doesn't start


Summary:

Vertex AI Data Store indexing hangs on Document ingestion is working in progress. Document parsing and indexing will start later. for each document.

Details:

I use Airbyte to put Confluence pages into BigQuery. Then I use Vertex AI Data Store to index these documents - I have ~1300 documents imported into it and I see Document ingestion is working in progress. Document parsing and indexing will start later. next to each document in the status column.

The documents are imported into data store (when I click View document button I see the data), but the status is as described above. When I use Vertex AI search with Grounding, this data store is not giving any meaningful results.

Surprisingly that has worked for me well on another (private) account a month back, but here on another account (belonging to my company's organization) the indexing doesn't work. I have tried creating the data store in various regions, using periodical sync or one-off sync, using different schemas etc. It always ends up like this.

I confirmed that I can perform billable actions btw. When I create Data Store based on Cloud Storage bucket, it works fine, so it seems it's related to BigQuery only.

Screenshots

Today's import: data store view

Import details: import errors

Confirmed I have not hit the Discovery Engine limits: Discovery Engine limits


Solution

  • Solved!

    Terrible DevEx on Google's side on this one.

    I needed to create a Vertex AI app that would actually use this data store before it started indexing the data. The fact that I used this data store in chat for grounding and for custom Cloud Run function did not seem to matter to the data store.

    Since it was a data store with multiple entities I had to create a Search type of app.