Summary:
Vertex AI Data Store indexing hangs on Document ingestion is working in progress. Document parsing and indexing will start later.
for each document.
Details:
I use Airbyte to put Confluence pages into BigQuery. Then I use Vertex AI Data Store to index these documents - I have ~1300 documents imported into it and I see Document ingestion is working in progress. Document parsing and indexing will start later.
next to each document in the status
column.
The documents are imported into data store (when I click View document button I see the data), but the status is as described above. When I use Vertex AI search with Grounding, this data store is not giving any meaningful results.
Surprisingly that has worked for me well on another (private) account a month back, but here on another account (belonging to my company's organization) the indexing doesn't work. I have tried creating the data store in various regions, using periodical sync or one-off sync, using different schemas etc. It always ends up like this.
I confirmed that I can perform billable actions btw. When I create Data Store based on Cloud Storage bucket, it works fine, so it seems it's related to BigQuery only.
Screenshots
Solved!
Terrible DevEx on Google's side on this one.
I needed to create a Vertex AI app that would actually use this data store before it started indexing the data. The fact that I used this data store in chat for grounding and for custom Cloud Run function did not seem to matter to the data store.
Since it was a data store with multiple entities I had to create a Search
type of app.