ruta

Ruta escaping special characters


I am working on a Ruta script to identify currency, but I am having troubles with special characters like dollar sign ($).

I tried with simple character:

W{REGEXP("(dollar|nzd|$)") -> MARK(EntityType)};

an escaping it:

PACKAGE uima.ruta.example;

W{REGEXP("(dollar|nzd|\$)") -> MARK(EntityType)};

In the first case my pattern is not recognized, in the second case my editor gives me an error.

What's is the correct way to identify special characters?

Cheers.


Solution

  • In UIMA Ruta, the special characters are part of the default seed annotation SPECIAL. Your rule matches only on word tokens W; therefore it won't fire.

    In case you want to match only on $ as special character, then you could limit the SPECIAL annotation with an REGEXP condition as you do for W:

    // I spent $100.
    SPECIAL{REGEXP("\\$"} -> Currency} NUM{-> Amount};
    

    Let me know if this helps.