I need to sort my elasticsearch documents by length of a text field. However, this field can sometimes contain a very large amount of text (up to 15000 characters). I am aware that the ignore_above parameter, which defaults to 256, can be used to limit the length of the text that is indexed.
Is it okay to change the ignore_above parameter to 15,000 to enable sorting by text length?
Alternatively, would it be a better solution to create an additional field, such as textLength, which contains the length of the text (e.g., textLength: 14000), and then sort by this textLength field?
Thanks
You can have fairly long fields indexed as text. Text typed fields are not limited by ignore_above.
Keyword typed fields that are long will only index part of the field up to ignore_above specified as you already pointed out.
The increase in the ignore_above you mentioned is not advised because the keyword's ignore_above value is limited to 32766 ASCII characters or 8191 UTF-8 characters and therefore one may ask what you would do if in the future you have longer fields.
But even if you increase the ignore_above value, you wouldn't be able to sort by the field's length anyway.
The best solution is the alternative that you mention. You can reindex your existing elasticsearch documents using a script processor in a pipeline and index a document using the same pipeline to add, for example, the textLength field you mention at insert time.