Difficulty using JAPE Grammar

I have a document which contains sections such as Assessments, HPI, ROS, Vitals etc. I want to extract notes in each section. I am using GATE for this purpose. I have made a JAPE file which will extract notes in the Assessment section. Following is the grammar,

Input: Token
Options: control=appelt debug=true

Rule: Assess
({Token.string =~"(?i)diagnose[d]?"}{Token.string=="with"} | {Token.string=~"(?i)suffering"}{Token.string=~"(?i)from"} | {Token.string=~"(?i)suffering"}{Token.string=~"(?i)with"})

(
({Token})*
):assessments

({Token.string =~"(?i)HPI"} | {Token.string =~"(?i)ROS"} | {Token.string =~"(?i)EXAM"} | {Token.string =~"(?i)VITAL[S]"} | {Token.string =~"(?i)TREATMENT[s]"} |{Token.string=~"(?i)use[d]?"}{Token.string=~"(?i)orderset[s]?"} | {Token.string=~"$"})


-->
:assessments.Assessments = {}

Now, when the assessment section is in the end of the document I can retrieve the notes properly. But if it is somewhere between two sections then this will return entire document from assessment section till the end of file.

I have tried using {Token.string=~"$"} in different ways but could not extract ONLY THE ASSESSMENT SECTION IRRESPECTIVE OF ITS PLACE IN THE DOC.

Please explain how can I achieve this using JAPE grammar.

Solution

That is correct since Appelt mode always prefers the longest possible overall match. Since any Token can match string =~ "$" the assessments label will grab all but the final token in the document.

I would adopt a two pass approach, using an initial gazetteer or JAPE phase to annotate the "section headings" and then another phase with only these heading annotations in its input line

Imports: { import static gate.Utils.*; }
Phase: AnnotateBetweenHeadings
Input: Heading
Options: control = appelt

Rule: TwoHeadings
({Heading.type ="assessments"}):h1
(({Heading})?):h2
-->
{
  Long endOffset = end(doc);
  AnnotationSet h2Annots = bindings.get("h2");
  if(h2Annots != null && !h2Annots.isEmpty()) {
    endOffset = start(h2Annots);
  }
  outputAS.add(end(bindings.get("h1")), endOffset, "Assessments", featureMap());
}

This will annotate everything between the end of the assessments heading and the start of the following heading, or the end of the document if there is no following heading.