ruby-on-railsrubyelasticsearchsearchkick

Add to add token filter to SearchKick?


I have a fairly simple use case, but so far it has been painful trying to find proper syntax.

I have written a few token filters:

  searchkick **{
    settings: {
      analysis: {
        filter: {
          replace_mm: {
            type: 'pattern_replace',
            pattern: '(\\d)mm',
            replacement: '$1 mm ',
          },
          replace_x_1: {
            type: 'pattern_replace',
            pattern: '(\\d) ?(×|x)',
            replacement: '$1 x ',
          },
          replace_x_2: {
            type: 'pattern_replace',
            pattern: '(×|x) ?(\\d)',
            replacement: ' x $2',
          },
          searchkick_edge_ngram: {
            type: 'edge_ngram',
            min_gram: 1,
            max_gram: 50
          },
        },
        analyzer: {
          custom_index: {
            type: 'custom',
            tokenizer: 'standard',
            filter: [
              'replace_mm',
              'replace_x_1',
              'replace_x_2',
              'lowercase',
              'asciifolding',
              'searchkick_edge_ngram',
            ],
          },
          # https://github.com/ankane/searchkick/blob/ee62747af38b264574995341b70bdad432f65e0f/lib/searchkick/index_options.rb#L95
          searchkick_word_start_index: {
            type: 'custom',
            tokenizer: 'standard',
            filter: [
              'lowercase',
              'asciifolding',
              'searchkick_edge_ngram',
            ],
          },
        },
      },
    },
    mappings: {
      properties: {
        name: {
          type: 'text',
          analyzer: 'custom_index',
        },
        alternate_names: {
          type: 'text',
          analyzer: 'custom_index',
        },
        description: {
          type: 'text',
          analyzer: 'searchkick_word_start_index',
        },
      },
    },
    highlight: [
      :name,
    ],
    search_synonyms: [
      # ...
    ]
  }

How do I actually make the indexing + search uitilize these filters. I want everything to remain the default, id like to just override the filters, to add a few more, as everything else works perfectly.


Solution

  • The key here is probably to leverage the merge_mappings parameter and providing custom mappings/settings.

    searchkick merge_mappings: true, mappings: {...}, settings: {...}
    

    If you want to add additional token filters, you also need to create additional analyzers in order to use them, as you cannot use a token filter directly, they are part of the analysis chain. So you started the right way, by adding your token filters into custom settings, but now you also need to add new analyzers to your custom settings, or override existing ones.

    The second part of the job, is to also provide custom mappings in order to leverage the new analyzer(s), unless you have overridden the existing analyzer of your field.

    Looking at how the default settings are defined, your settings structure needs to be like this in order to be correctly merged:

    {
      settings: {
        analysis: {
          filter: {
            replace_mm: {
              type: 'pattern_replace',
              pattern: '(\\d)mm',
              replacement: '$1 mm ',
            },
            replace_x_1: {
              type: 'pattern_replace',
              pattern: '(\\d) ?(×|x)',
              replacement: '$1 x ',
            },
            replace_x_2: {
              type: 'pattern_replace',
              pattern: '(×|x) ?(\\d)',
              replacement: ' x $2',
            },
          },
          analyzer: {
            my_new_analyzer: { ... },
            existing_analyzer: { ... },
          }
        },
      },
      mappings: {
        properties: {
          your_field: {
            type: text,
            analyzer: my_new_analyzer
          }
        }
      }
    

    my_new_analyzer would be a new analyzer that you can use for one of your fields in a custom mapping (if provided), whereas existing_analyzer would be an override of an existing one that takes into account your new token filters.

    Which analyzer you override depends on the type of your field as there's one different analyzer per type named searchkick_#{type}_index (full list available here).

    I hope this sets you on the right track.

    edit

    I'm also trying to find out how to use this at search time

        rows = Something.search({
          body: {
            query: {
              multi_match: {
                query: params[:query],
                type: '',
                fields: [
                  '',
                ],
              },
            },
          },
        })