solrckan

CKAN: How to enable partial string search?


The default search behavior in CKAN is to only search full words. I'd like to enable partial string search. This kind of Google search behavior should apply to CKAN as well.

Example: my dataset titled Testdatensatz should be found when searching for Test, Testd, datensa and so on.

The default settings in CKAN allow to find Testdatensatz only if the full word in entered as search term.

How to configure CKAN / SOLR for partial string matching?

Tried to add EdgeNGramFilterFactory in SOLR config but no success so far. Seems to be ignored.

<fieldType name="text" class="solr.TextField" omitNorms="false">
  <analyzer type="index">
    ...
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.PorterStemFilterFactory"/>
    <filter class="solr.EdgeNGramFilterFactory" minGramSize="3" maxGramSize="15" side="front"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    ...
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.PorterStemFilterFactory"/>
  </analyzer>
</fieldType>

I'm using CKAN 2.10, 2.11 and SOLR 9

EDIT:

I always rebuilt the search index using ckan search-index rebuild


Solution

  • Solution found:

    I deploy using Docker. And I made the changes in the CKAN container and thus they had no effect.

    Since SOLR runs in a separate Container the changes have to be made there.

    So keep your Hands off this file ./ckan/config/solr/schema.xml in CKAN Container and modify https://github.com/ckan/ckan-solr/blob/master/solr-9/Dockerfile instead. A good example for adding filters can be found in https://github.com/ckan/ckan-solr/blob/master/solr-9/Dockerfile.spatial

    After the changes, don't forget to reindex in the CKAN container:

    $ ckan search-index rebuild
    

    and it works.

    Now, the search string Test finds the strings Testdaten, Testdatensatz, ...

    Please note that search only finds matches from the first character on. So the search string est does not find Testdaten, Testdatensatz, ... but that's ok.