javaelasticsearchelasticsearch-5kibana-5

Elasticsearch mapping: is there a disadvantage in using type text for properties which are keyword by nature?


My stack: Elasticsearch 5.4 (with corresponding version of java client and kibana)

Hi, I'm using dynamic mapping when creating new indices, and I'm using the below section in my mapping for unknown properties.

    {
      "string_fields": {
        "match": "*",
        "match_mapping_type": "string",
        "mapping": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        }
      }
    }

I'm indexing about 30k of documents per second, and the amount of unique unknown properties can be large (around 5k across all indices).

Questions:
Is there any performance hit (latency/computation/memory/disk) I should be worry when indexing properties as text, where in fact they should be only keyword by nature?

should I make an effort in my application logic to identify if each new unknown property is best suitable to be mapped either as text or keyword only?


Solution

  • You should definitely try to identify those fields and not map them as text. There's a burden associated to text fields as they go through the analysis pipeline. Their values get tokenized, filtered, and indexed.

    If you ever need to perform full-text search on them, then definitely index them as text, otherwise try not to. You'll save CPU cycles and disk space while indexing, heap while querying and time when restarting your cluster (because your indexes will be smaller).

    I've only scratched the surface here, but the bottom line is that text comes with a burden, keyword much less so.