javauima

UIMA CAS FSIndexRepository Merging


I'm currently working in a project with the UIMA framework in JAVA 11.

An interesting avenue for us is multithreading, specifically using CAS-objects in a multithreaded environment. Since CAS-objects are not thread-safe, we considered using Views to write to and then merge the different Views.

The merge step is necessary because at some point in the pipeline a node may depend on two parent nodes annotation to proceed.

I have gathered that CAS-objects store the annotations inside of the FSIndexRepository and that every View of a CAS-object has its own FSIndexRepository.

That is how I came to the question, wether there is a way of merging two CAS-Views?


Solution

  • The views are part of the CAS object, so you should not consider them thread-safe either.

    The common approach to using CAS objects in a multi-threaded environment is to either the trivial way:

    or the sophisticated way:

    The CasMultiplier would typically internally use the CasCopier to transfer FeatureStructures between source and target CAS objects.

    Of course, you can use a CAS without UIMA's pipelining model (i.e. without CasMultiplier), but the approach using the CasCopier to first distribute data from a source CAS to multiple target CASes, processing these individually and then merging back remains the same.

    Note that you can set a mark in a CAS to tell UIMA to track any FeatureStructures that have added after the creatio of the mark. You can set this mark after each of the split-up CASes have been created to help you discovering which are the new FeatureStructures in each of the copy and then only merge those new ones back into the original CAS using the CasCopier in the final step.

    Coming back to your original question: if you would be working with views, the CasCopier would also be the way to go for copying data between views.

    Disclosure: I am a contributor to the Apache UIMA project.