I am trying to match some multi-word tokens using UIMA RUTA 2.6.0. And there are some phrases that are partially equal to each other, e. g. in the same file I has following entries: "includes the", "include the", "in this", "in the".
There is next piece of text in my input file: "1. "Agents or employees" includes the directors...". Obviously, there is a "includes the" match, but if other above 3 entries are present in wordlist then no match will be found. Moreover, the ordering of those entries in wordlist does not depend on matching success: it always fails.
And this issue occurs not only in single file. So, the question: how can I fix it? May be some settings of RUTA annotator?
Whitespaces in the wordlist can lead to missed matches. If the whitespaces are not important, set the configuration parameter 'dictRemoveWS' to true.
DISCLAIMER: I am a developer of UIMA Ruta