I have started learning GATE application and I would like to use it to extract information from an unstructured document. The information I am interested in are date, location, event information and person’s names. I would like to get information about events that happened at a specific location on a specific date and the person/s name. I have been reading the GATE manual and thats how I got the glimpse on how to build your pipeline. However, I am not figuring out how I can create my new annotation types and make sure that they are annotated to a new annotation set which should appear under the annotation sets on the right. I found similar questions like GATE - How to create a new annotation SET? but it didn help me either.
Let me explain what I did so far:
I identified my patterns in the document e.g for Date formats like ddmm, dd.mm.yyyy
I wrote JAPE rule for each pattern in a separate .jape file
This is how my JAPE Rule looks like for one date format:
Phase: datesearching
Input: Token Lookup SpaceToken
Options: control = appelt
////////////////////////////////////Macros
//Initialization of regular expressions
Macro: DAY_ONE
({Token.kind == number,Token.category==CD, Token.length == "1"})
Macro: C
({Token.kind == number,Token.category==CD, Token.length == "2"})
Macro: YEAR
({Token.kind == number,Token.category==CD, Token.length == "4"})
Macro: MONTH
({Lookup.majorType=="Month"})
Rule: ddmmyyydash
(
(DAY_ONE|DAY_TWO)
({Token.string == ","}|{Token.string == "."} |{Token.string == "-"})
(MONTH)
({Token.string == ","}|{Token.string == "."} |{Token.string == "-"})
(YEAR)
)
:ddmmyyyydash
-->
:ddmmyyyydash.DateMonthYearDash= {rule = "ddmmyyyydash"}
Can someone please help me with what I should do to make sure that DateMonthYearDash is created as a new annotation set? How do I do it? Thanks a lot.
When I change the outputAsName of the Jape Transducer the new set is not appearing like the rest. This is how it looks:
As said, linked or quoted in the question you mention (GATE - How to create a new annotation SET?), you have two options:
JAPE transducer (similarly to many other GATE PRs) simply takes some input annotations and based on them it creates some new output annotations. The input and output annotation sets names can be configured by inputASName
and outputASName
run-time parameters. inputASName
says where it should look for input annotations and outputASName
says where it should put output annotations to.
The input annotation set must contain the necessary input annotations before the JAPE transducer PR is executed. These annotations are usually created by preceding PRs in the pipeline. Otherwise it will not see the necessary input annotations and it will not produce anything.
The output annotation set may be empty or it may contain anything before the JAPE execution. It doesn't matter. The import thing is that the new output annotations (DateMonthYearDash
in your case) are created there when the JAPE transducer PR execution finished.
So after successful JAPE execution you should see the new annotations there.
Note that annotation sets have names.
While annotations have type, id, offsets, features and annotation set they belong to.
I found some issues in your JAPE grammar:
SpaceToken
unless you explicitly use them in your grammar or you are sure there will be none inside the pattern... See also: Concept of Space Token in JAPE({Lookup.majorType=="Month"})
-> ({Lookup.minorType=="month"})
(DAY_ONE|DAY_TWO)
-> (DAY_ONE)
After corrections + after ANNIE pipeline for document 9 - January - 2017
:
JAPE grammar after corrections:
Phase: datesearching
Input: Token Lookup
Options: control = appelt
Macro: DAY_ONE
({Token.kind == number,Token.category==CD, Token.length == "1"})
Macro: YEAR
({Token.kind == number,Token.category==CD, Token.length == "4"})
Macro: MONTH
({Lookup.minorType=="month"})
Rule: ddmmyyydash
(
(DAY_ONE)
({Token.string == ","}|{Token.string == "."} |{Token.string == "-"})
(MONTH)
({Token.string == ","}|{Token.string == "."} |{Token.string == "-"})
(YEAR)
)
:ddmmyyyydash
-->
:ddmmyyyydash.DateMonthYearDash= {rule = "ddmmyyyydash"}
You have to investigate the input annotations and "debug" your JAPE grammar. Usually there is some expected input annotation missing or there is some extra annotation you did not expect to be there. There is a nice view in GATE for this purpose: annotation stack. Also some features of input annotations can have different name or value than you expected (e.g. What is correct: {Lookup.majorType=="Month"}
or {Lookup.minorType=="month"}
?).
By "debugging" a JAPE grammar I mean: try to simplify the rule as far as it starts working. Keep trying it on a simple document where it should match for sure. So in your case you can try it without the (DAY_ONE)
part. If it still doesn't work, try only (MONTH)({Token.string == "-"})(YEAR)
, or even (MONTH)
only, etc. Until you find the mistake in the grammar...