[SOLVED] Elasticsearch array of strings being tokenized even with no

Elasticsearch array of strings being tokenized even with no_analyzed in mapping

This has been driving me nuts. I've got a few arrays in my data, here is a slimmed down version:

{
"fullName": "Jane Doe",
"comments": [],
"tags": [
    "blah blah tag 1",
    "blah blah tag 1"
],
"contactInformation": {
    "attachments": [
        "some file 1",
        "some file 2",
        "some file 3"
    ]
}
}

Ok so my mappings in elasticsearch are as follows:

curl -XPOST localhost:9200/myindex -d '{
"settings" : {
    "number_of_shards" : 1
},
"mappings" : {
    "docs" : {
        "properties" : {
            “tags” : { "type" : "string", "index" : "not_analyzed" }
            “attachments” : { "type" : "string", "index" : "not_analyzed" }
        }
    }
}
}'

Now if I display these as facets the tags appear fine, like so:

[ ] - blah blah tag 1

[ ] - blah blah tag 2

However the attachments are tokenized and I get a facet for every single word i.e.

[ ] - some

[ ] - file

[ ] - 1

I was thinking since the attachments property lives inside contactInformation, my mapping might need to look like this: “contactInformation.attachments” : { "type" : "string", "index" : "not_analyzed" }

But that threw an error, not expecting the dot.

Any ideas?

Solution

See the "Complex Core Field Types" documentation (in particular, the section titled "Mapping for Inner Objects").

It should look something like this:

"mappings" : {
  "docs" : {
    "properties" : {
      “tags” : { "type" : "string", "index" : "not_analyzed" },
      "contactInformation": {
        "type": "object",
        "properties": {
          “attachments” : { "type" : "string", "index" : "not_analyzed" }
        }
      }
    }
  }
}