eclipsemavenuimarutadkpro-core

How/are you supposed to use the DKPro libraries with UIMA Ruta?


I have studied the default UIMA Ruta Workbench Eclipse project enough to significantly understand its moving parts - for instance, why the input/ and output/ folders behave as they do, how to accomplish the project using the jcasgen and other Maven plugins, etc.

But even after hours of studying the project and playing with Maven to try to get it to work, I still have a lot of trouble doing something very simple: using the DKPro libraries (the types especially) from a Ruta script.

My fundamental question is this: what is the path of least resistence towards using the types and analysis components from the DKPro and TC libraries within a Ruta script?

My specific questions are:

  1. I noticed that in the desc/type folder of many api jars there are TypeSystemDescription XML files that would appear to be appropriate for use with Ruta. Is there some way of getting a "master" TypeSystemDescription XML file for the DKPro components?

  2. Is there a project of significant complexity that uses both Ruta and DKPro that I can study?

  3. What is the distinction between an AnalysisEngine as in what you do with Ruta scripts and an Analysis Component you write in Java?

Edited to reflect less frustration


Solution

  • Actually, the Ruta and DKPro people do workshops together and sit happily around the campfire afterwards - or at least in a cocktail bar and have some drinks. Unfortunately, we don't get around to doing that very often.

    The kind and number of questions you are asking calls for a tutorial ;)

    Did you have a look the slides and examples from our joint workshop at GSCL 2013?

    It includes several examples of how to use DKPro Core and Ruta together. In those examples, there is a Maven project responsible for fetching the DKPro Core dependencies and separate Ruta projects then have a dependency on that Maven project and use the analysis engines.

    It should also work to have a single project with both, the Ruta and Maven natures.

    1. The way to get a single type descriptor for all DKPro Core types in your classpath (or rather for all uimaFIT-enabled types in your classpath) is

      import org.apache.uima.fit.factory.TypeSystemDescriptionFactory;
      
      OutputStream os = ...
      TypeSystemDescriptionFactory.createTypeSystemDescription().toXML(os);
      
    2. Check out the GSCL 2013 tutorial examples.

    3. AnalysisComponent represents the view from the inside, i.e. from the perspective of the developer of components (the view from within the framework). AnalysisEngine represents the view from the outside, i.e. from the user of a component/workflow. However, typically one would say "I'm implementing a new analysis engine" and mean "I'm going to subclass JCasAnnotator_ImplBase (an implementation of AnalysisComponent)". See also this post on the UIMA developer mailing list.

    Disclosure: I am a DKPro Core developer and an Apache UIMA developer.