javauimadkpro-core

Change text in reusable pipeline in DKPro


This questions describes how to reuse a pipeline in dkpro but if I only create one JCas and then try to change the text then I get the exception

org.apache.uima.cas.CASRuntimeException: Data for Sofa feature setLocalSofaData() has already been set.

How do I get around this?


Solution

  • The sofa data in the CAS can only be set once. It cannot be modified after it has been set.

    In order to re-use a CAS, call the reset() method on it. This clears all annotations and allows you to set the sofa/text again.

    To build a CAS incrementally, a common strategies is to add annotations to the CAS while adding text to a string buffer and setting the text only at the end of the process.

    An uimaFIT-based example could look something like this:

    Strings[] texts = {
        "Hello world.",
        "This is a test." };
    
    // Create empty CAS/JCas initialized using uimaFIT typesystem auto-detection
    JCas jcas = JCasFactory.createJCas();
    
    // Instantiate some analysis engine
    AnalysisEngine engine = AnalysisEngineFactory.createEngine(...);
    
    // Process texts re-using the previously created CAS/JCas instance
    for (String t : texts) {
        jcas.reset();
        jcas.setDocumentText(t);
        jcas.setDocumentLanguage("en");
        engine.process(jcas);
    }
    
    engine.collectionProcessComplete();
    engine.destroy();
    

    Disclosure: I am working on the Apache UIMA project.