episerverepiserver-find

Selective Indexing of MediaData in Multisite Applications


I have a multisite application and need to remove search data from the index for the MediaData type, but I want to do this for specific sites only. Is there a way to traverse the site repository within this context and include indexing only for MediaData on certain sites?

  [InitializableModule]
  [ModuleDependency(typeof(EPiServer.Web.InitializationModule))]
  public class FindInitialization : IInitializableModule
  {
    private ContentAssetHelper contentAssetHelper;
    private ContentIndexer contentIndexer;
    private ISiteDefinitionRepository _siteDefinitionRepository;
    private IContentLoader _contentLoader;

    public void Initialize(InitializationEngine context) 
    {
      contentAssetHelper = ServiceLocator.Current.GetInstance<ContentAssetHelper>();
      contentIndexer = ServiceLocator.Current.GetInstance<ContentIndexer>();
      _contentLoader = ServiceLocator.Current.GetInstance<IContentLoader>();
      _siteDefinitionRepository = ServiceLocator.Current.GetInstance<ISiteDefinitionRepository>();

      foreach (var siteDefinition in _siteDefinitionRepository.List())
      {
        var startPage = _contentLoader.Get<StartPage>(siteDefinition.StartPage);
        // Need to include indexing only for specific sites
        // If the site is 'abc' or 'xyz', index only MediaData within that site; otherwise, skip indexing for that site.
        
        ContentIndexer.Instance.Conventions.ForInstancesOf<MediaData>().ShouldIndex(p => true);
      }
    }

    public void Uninitialize(InitializationEngine context) { }
  }


Solution

  • I've done something like this to decide which sites that should be indexed.

        var indexer = this.engineContext.Locate.Advanced.GetInstance<IContentIndexer>();
        var sitesToIndex = new[] { "abc", "xyz" };
        var contentLoader = ServiceLocator.Current.GetInstance<IContentLoader>();
        var siteDefinitionRepository = ServiceLocator.Current.GetInstance<ISiteDefinitionRepository>();
        
        var allChildren = new List<int>();
        
        foreach(var siteToIndex in sitesToIndex) 
        {
            var site = siteDefinitionRepository.List().FirstOrDefault(x => x.Name.Trim().Equals(siteToIndex));
        
            if(site == null) 
            {
                // log or whatever
                Console.WriteLine($"could not fetch host for {siteToIndex}");
                continue;
            }
        
            var children = contentLoader.GetDescendents(site.StartPage).Select(y => y.ID);
            allChildren.AddRange(children);
        }
        
        indexer.Conventions.ForInstancesOf<PageData>().ShouldIndex(x => allChildren.Contains(x.ContentLink.ID));
    

    This code will mark only pagedata objects deriving from abc and xyz to be indexed. Now, all left to do is to traverse each content object and see if there are any MediaData linked and index those specific items.

    See this as a starting point, but it will get the job done.