I have studied the default UIMA Ruta Workbench Eclipse project enough to significantly understand its moving parts - for instance, why the input/
and output/
folders behave as they do, how to accomplish the project using the jcasgen
and other Maven plugins, etc.
But even after hours of studying the project and playing with Maven to try to get it to work, I still have a lot of trouble doing something very simple: using the DKPro
libraries (the types especially) from a Ruta
script.
My fundamental question is this: what is the path of least resistence towards using the types and analysis components from the DKPro and TC libraries within a Ruta script?
My specific questions are:
I noticed that in the desc/type
folder of many api
jars there are TypeSystemDescription
XML files that would appear to be appropriate for use with Ruta. Is there some way of getting a "master" TypeSystemDescription
XML file for the DKPro
components?
Is there a project of significant complexity that uses both Ruta
and DKPro
that I can study?
What is the distinction between an AnalysisEngine
as in what you do with Ruta
scripts and an Analysis Component
you write in Java?
Edited to reflect less frustration
Actually, the Ruta and DKPro people do workshops together and sit happily around the campfire afterwards - or at least in a cocktail bar and have some drinks. Unfortunately, we don't get around to doing that very often.
The kind and number of questions you are asking calls for a tutorial ;)
Did you have a look the slides and examples from our joint workshop at GSCL 2013?
It includes several examples of how to use DKPro Core and Ruta together. In those examples, there is a Maven project responsible for fetching the DKPro Core dependencies and separate Ruta projects then have a dependency on that Maven project and use the analysis engines.
It should also work to have a single project with both, the Ruta and Maven natures.
The way to get a single type descriptor for all DKPro Core types in your classpath (or rather for all uimaFIT-enabled types in your classpath) is
import org.apache.uima.fit.factory.TypeSystemDescriptionFactory;
OutputStream os = ...
TypeSystemDescriptionFactory.createTypeSystemDescription().toXML(os);
Check out the GSCL 2013 tutorial examples.
AnalysisComponent
represents the view from the inside, i.e. from the perspective of the developer of components (the view from within the framework). AnalysisEngine
represents the view from the outside, i.e. from the user of a component/workflow. However, typically one would say "I'm implementing a new analysis engine" and mean "I'm going to subclass JCasAnnotator_ImplBase
(an implementation of AnalysisComponent
)". See also this post on the UIMA developer mailing list.
Disclosure: I am a DKPro Core developer and an Apache UIMA developer.