elasticsearchluceneelasticui

Elasticsearch array of strings being tokenized even with no_analyzed in mapping


This has been driving me nuts. I've got a few arrays in my data, here is a slimmed down version:

{
"fullName": "Jane Doe",
"comments": [],
"tags": [
    "blah blah tag 1",
    "blah blah tag 1"
],
"contactInformation": {
    "attachments": [
        "some file 1",
        "some file 2",
        "some file 3"
    ]
}
}

Ok so my mappings in elasticsearch are as follows:

curl -XPOST localhost:9200/myindex -d '{
"settings" : {
    "number_of_shards" : 1
},
"mappings" : {
    "docs" : {
        "properties" : {
            “tags” : { "type" : "string", "index" : "not_analyzed" }
            “attachments” : { "type" : "string", "index" : "not_analyzed" }
        }
    }
}
}'

Now if I display these as facets the tags appear fine, like so:

[ ] - blah blah tag 1

[ ] - blah blah tag 2

However the attachments are tokenized and I get a facet for every single word i.e.

[ ] - some

[ ] - file

[ ] - 1

I was thinking since the attachments property lives inside contactInformation, my mapping might need to look like this: “contactInformation.attachments” : { "type" : "string", "index" : "not_analyzed" }

But that threw an error, not expecting the dot.

Any ideas?


Solution

  • See the "Complex Core Field Types" documentation (in particular, the section titled "Mapping for Inner Objects").

    It should look something like this:

    "mappings" : {
      "docs" : {
        "properties" : {
          “tags” : { "type" : "string", "index" : "not_analyzed" },
          "contactInformation": {
            "type": "object",
            "properties": {
              “attachments” : { "type" : "string", "index" : "not_analyzed" }
            }
          }
        }
      }
    }