elasticsearchporter-stemmer

ElasticSearch and Porterstem analyser


I'm looking at using Elasticsearch to provide the search functions of our site.

I've been experimenting with it but am unable to enable the Porterstem analyser (so that a search for fight matches fights and fighting).

Here's a run down of my input.

    curl -XPUT localhost:9200/local/ -d'
    index :                     
        analysis : 
            analyzer : 
                stemming : 
                    type : custom 
                    tokenizer : standard 
                    filter : [standard, lowercase, stop, porterStem] 
    '

    curl -XPUT localhost:9200/local/_mapping -d'{"properties": { "title" : { "analyzer" : "stemming", "type" : "string" }}}'

    curl -XPUT localhost:9200/local/article/1 -d'{"title": "Fight for your life"}'
    curl -XPUT localhost:9200/local/article/2 -d'{"title": "Fighting for your life"}'
    curl -XPUT localhost:9200/local/article/3 -d'{"title": "My dad fought a dog"}'
    curl -XPUT localhost:9200/local/article/4 -d'{"title": "Bruno fights Tyson tomorrow"}'

However running a search for 'fight' only matches the first entry - the one that contains the exact term.

curl -XGET localhost:9200/local/_search?q=fight

The correct settings appear to have been set up but doesn't seem to work.

      "indices" : {
        "local" : {
          "aliases" : [ ],
          "settings" : {
            "index.analysis.analyzer.stemming.type" : "custom",
            "index.analysis.analyzer.stemming.tokenizer" : "standard",
            "index.analysis.analyzer.stemming.filter.1" : "lowercase",
            "index.analysis.analyzer.stemming.filter.0" : "standard",
            "index.analysis.analyzer.stemming.filter.3" : "porterStem",
            "index.analysis.analyzer.stemming.filter.2" : "stop",
            "index.number_of_shards" : "5",
            "index.number_of_replicas" : "1"
          },

Anyone got this functionality up and running and able to point me in the right direction?


Solution

  • There is an example config on using custom analyzers, using the snowball stemmer: Why ElasticSearch is not finding my term