javauimarutaheideltime

Add HeidelTime as Analysis Engine in UIMA Ruta Workbench


I would like to run HeidelTime before I add and improve on the resulting annotations using a UIMA Ruta script. I can of course run these in sequence in a pipeline from Java, but it would be more convenient if this were possible from the UIMA Ruta Workbench.

From what I understand from the UIMA Ruta Manual, it is possible to add external Analysis Engines using the UIMAFIT keyword. I've been looking for ways to add the HeidelTime standalone JAR to the CLASSPATH, but I've been unable to have the UIMA Ruta Workbench detect the HeidelTime analysers.

So my questions is: how I can conveniently include HeidelTime in my UIMA Ruta scripts in the UIMA Ruta Workbench? Note that I'm new to UIMA, UIMA Ruta and Eclipse.


Solution

  • First the bad news: you cannot use HeidelTime in UIMA Ruta as a uimaFIT analysis engine because it is not a uimaFIT component. It could actually work but won't in this case because of the hard requirements of default values during initialize() of HeidelTime and the requirement of non-string parameter values. UIMA Ruta does not support that for the declaration injection of parameter values. It would look like the following:

    UIMAFIT de.unihd.dbs.uima.annotator.heideltime.HeidelTime(Language,german,Date,True,Time,True,Duration,True,Set,True,Temponym,False,Type,news);
    

    The good news is that you can use HeidelTime by using the analysis engine description HeidelTime.xml. However, HeidelTime has some special build which is not supported by UIMA Ruta, thus you need some customizing.

    What do you need to do in order to call HeidelTime from within a Ruta script? There are several options. Here's one I tested with UIMA Ruta Workbench 2.6.1:

    1. Copy the descriptors HeidelTime.xml and HeidelTime_TypeSystem.xml to your descriptor folder in your Ruta project.
    2. Modify the HeidelTime.xml descriptor: relink the type system import to point to the same folder: <import location="HeidelTime_TypeSystem.xml"/>
    3. Do optionally the same for other descriptions for tokens and sentences
    4. Import all descriptions in your script and call the analysis engines, e.g., with mocked tokens and sentences:

      ENGINE HeidelTime;
      TYPESYSTEM HeidelTime_TypeSystem;
      ANY{-> Token};// mock tokenizer and sentence splitter
      (# PERIOD){-> Sentence};
      (PERIOD # PERIOD){-> Sentence};
      EXEC(HeidelTime, {Timex3});
      t:Timex3{t.timexType == "DATE"}; // do something with a date
      

    The last thing you need to do in order for this to work is the add HeidelTime to the classpath of the launch delegate of your script. There are two options:

    1. Import the HeidelTime project in your workspace and set a reference to it. Right-click your Ruta project: Popup menu -> Properties -> Project References -> check heideltime
    2. Add the HeidelTime Jar directly to the classpath. Select Run Configurations..., select your script, switch to the Classpath tab and add the jar there.

    I would recommend option 1 since you need the descriptions anyway.

    Overall, I would of course recommend calling HeidelTime in a Java pipeline and not in a Ruta script.

    DISCLAIMER: I am a developer of UIMA Ruta