I use skip wild card #
for text between rule elements.
However, I mark always per line, thus I m able to use #{-CONTAINS(BREAK)}
for example RuleElementA #{-CONTAINS(BREAK)} RuleElementB
must be on a single line
How can I declare/save #{-CONTAINS(BREAK)}
so that i could use later just shortcut like
RuleElementA sc RuleElementB
?
You should try to annotate first your building block (i.e. Lines) and create your target annotations based on that (so-called Bottom-Up Matching Strategy in UIMA Ruta).
Therefore, your can annotate all the lines in the input document by following a naive approach:
DECLARE Line;
ADDRETAINTYPE(BREAK);
BREAK #{-> MARKONCE(Line)} @BREAK;
REMOVERETAINTYPE(BREAK);
This would allow you to remain on the line level while creating the target annotations. You could then iterate over all the Line
s in the document in order to ensure the correctness of your span:
BLOCK (forEach) Line{CONTAINS(W)}{
RuleElementA # RuleElementB
}
Alternatively, you could make use of the PlainTextAnnotator which is by default, part of the UIMA Ruta installation package. This approach can guarantee you a better line detection:
ENGINE utils.PlainTextAnnotator;
TYPESYSTEM Utils.PlainTextTypeSystem;
EXEC(PlainTextAnnotator, {Line, EmptyLine});
DECLARE FreeLine, LineFree;
ADDRETAINTYPE(WS);
EmptyLine Line{-> FreeLine};
Line{-> LineFree} BREAK[1,2] @EmptyLine;
Line{-> TRIM(WS)};
FreeLine{-> TRIM(WS)};
LineFree{-> TRIM(WS)};
REMOVERETAINTYPE(WS);