searchsolrcluster-analysismahoutcarrot2

Off-line clustering using solr?


I want to cluster my indexed data in solr. Each solr document contains the following fields : id, title, url.

I have read solr 7.7 docs and the clustering algorithm mentioned there is applied only to the search result of each single query. And my need is a full index clustering based on the document title.

Anyone could help?


Solution

  • As far as I'm aware, there's no out-of-the-box plugin for clustering the whole Solr index.

    If you have some background in machine learning, have a look at Apache Mahout, it should be suitable for clustering a dataset of this size. Alternatively, there's a commercially-licensed Carrot2 spin-off we develop called Lingo4G, which is designed for clustering large collections of text. In both cases, however, there is no direct integration with Solr -- you'd need to handle the integration on your own.