azure-cosmosdbazure-cognitive-search

How to index a specific collection in a Cosmos datasource with Azure AI Search?


I have resources set up for Azure AI Search and CosmosDB. Within my one CosmosDB resource, I have two collections in there with documents in it: Collection1 and Collection2.

When creating an index or indexer in Azure AI Search, I can make it connect with the CosmosDB resource, and it only returns data from Collection1. This is good for that collection, but when creating another index/indexer, with the field names of Collection2, it returns all null.

I know the index is only retrieving and reading data from Collection one because 1) It returns the _ids of documents in Collection1 and 2) When creating an index with the key being a field that is only present in Collection2 (rather than doc_id), the indexer for that index will not run, stating that the "Collection2-only" field does not exist.

How do I get my indexes to return data from a specified collection within a CosmosDB datasource? I am not seeing a field to specify a collection within the index or indexer.

Example of what's happening:

Collection1's example document:

{
    _id: 1,
    "A": "A",
    "B": "B"
}

index1: doc_id is a string and searchable, A and B are strings and filterable and searchable.

indexer1: default, with the dataSourceName being dataSource. targetIndexName is index1

Collection2's example document:

{
    _id: 2,
    "C": "C",
    "D": "D"
}

index2: doc_id is searchable, C and D are filterable and searchable.

indexer2: default, with the dataSourceName being dataSource. targetIndexName is index2

Result of Index1

{
    _id: 1,
    "A": "A",
    "B": "B"
}

Result of Index2

{
    _id: 1,
    "C": null,
    "D": null
}

Note that the above returns the fields specified in Index2 and are present in Collection2, but returns the _id of the document in Collection1.


Solution

  • This was a very easy fix that I had somehow missed.

    When adding CosmosDB data source, each collection is one data source. So I had incorrectly assumed that I had connected all of my collections in one data source.

    To add another collection source: In the Azure AI Search portal, you are able to Import Data on the Overview page. From there, you can choose the CosmosDB data source option, then choose the database and collection.

    enter image description here