sitecoresitecore7.2coveo

What is role of crawler in Coveo rebuild index


While we perform the index rebuild in Sitecore for the Coveo, how does the Coveo crawler work internally and publish items to Coveo Cloud?


Solution

  • A crawler is a Coveo Cloud module that scans items to index and extracts their content. If an item is secured, the crawler also extracts its permissions and saves them as item metadata.

    Coveo for Sitecore default crawler configurations are defined in the Coveo.SearchProvider.config and, by default, Coveo for Sitecore indexes all content and media items under /sitecore/content and /sitecore/media library/Files accordingly. You can patch those configurations in Coveo.SearchProvider.Custom.config by changing the crawling root of your indexes to prevent indexing undesirable items in specific indexes and/or adding a new crawling root.

    Coveo for Sitecore leverages Sitecore index update strategies to automatically index Sitecore items. Created, deleted, and modified items in the master database are indexed as those events occur. In the web database, published items are indexed at the end of the publish operation (OnPublishEndAsync).

    With the help of the crawlers configured on the search index, Sitecore iterates over a set of items and passes them to the search index using the Search Provider framework. During indexing each Sitecore item is being converted into a Coveo item and its text gets extracted.

    Rebuilding is the crawling a set of documents and pushing them into the index. At the end of the process, the search index will contain only the crawled documents. Since Coveo Cloud is an online service, the items have to be uploaded in order to be indexed.