I am working on a Ruta script to identify currency, but I am having troubles with special characters like dollar sign ($).
I tried with simple character:
W{REGEXP("(dollar|nzd|$)") -> MARK(EntityType)};
an escaping it:
PACKAGE uima.ruta.example;
W{REGEXP("(dollar|nzd|\$)") -> MARK(EntityType)};
In the first case my pattern is not recognized, in the second case my editor gives me an error.
What's is the correct way to identify special characters?
Cheers.
In UIMA Ruta, the special characters are part of the default seed annotation SPECIAL
. Your rule matches only on word tokens W
; therefore it won't fire.
In case you want to match only on $ as special character, then you could limit the SPECIAL
annotation with an REGEXP
condition as you do for W
:
// I spent $100.
SPECIAL{REGEXP("\\$"} -> Currency} NUM{-> Amount};
Let me know if this helps.