google-cloud-vertex-ai

How to filter the Google Vertex AI search to return pdf file only in advanced website indexing?


I'm using the Google Vertex AI search to index all our websites. The index type is advanced website indexing.

I want to filter my search to only return .pdf file only.

According to the documentation https://cloud.google.com/generative-ai-app-builder/docs/filter-website-search#available-fields-advanced-indexing

Only two fields is available. siteSearch and meta

My question: How can I setup my filter to return only ".pdf" file?

I did try {filter: "fileType: \".pdf\""} but it said error: Unsupported field .. I know the fileType is for the simple website indexing.

Also, what filter to return only the website page, but not any file eg. pdf, word...

If I test filter by {filter: "siteSearch: \"https://www.example.com\""} then the filter is works fine.

The documentation doesn't shows more examples for the filter.


Solution

  • I use the siteSearch to get the pdf file only.

    {filter: "siteSearch:\".pdf\""}
    

    So, it looks like you can use the siteSearch to filter anything you want.