I'm using Elasticsearch version 6.8. I want to store an identifier (a string with a combination of letters, numbers, and possibly whitespace). The only filter I will use on that field will be the exists
filter (I will check if the value is set). What is the best option here, to use the keyword
type or a text
type? For the text
type I can probably set
"norms": false,
"index_options": "freqs"
to reduce the index size.
The documentation states that, as this is the "structured" text, the best option would be to use the keyword
type, but as the number of possible values is huge (it's an ID), I'm afraid this would take a lot of disk space.
I have an index with millions of records so I want to keep the disk usage low for this field. Which option is the best regarding the disk space, and what is the performance impact?
Since you don't want to search on the values of this field or run aggregations on them, you should store this field as keyword
with doc_values
disabled.
"fieldName": {
"type": "keyword",
"doc_values": false
}
Disabling the doc_values
will save you disk space.
The fields mapped as text
does not have the doc_values
enabled and could use less space, but they are analyzed and could take space in memory.
If you don't care at all about the value of the field you can even change it to a simple string or a single digit during ingestion, depending on how you are ingesting your data.