solrdatastax-enterpriseuimadatastax-startup

dse search apache uima integration


I have DSE 5.1, with solr 6.0.1, installed on CentOS 7.3.1611 (Core). And UIMA configured in eclipse for my annotator project.

I am following the Solr documentation: https://wiki.apache.org/solr/Solr4UIMA

The project created with UIMA has a simple annotator to get the names of persons, already works fine in the CAS Visual Debugger from UIMA. And the jar is already created and copied in the solr lib directory (DSE_HOME/solr/lib) here are also the jars of solr for the integration of uima (SOLR_HOME/contrib/uima/lib, SOLR_HOME/contrib/uima/lucene-lib, SOLR_HOME/dist/solr-uima-version).

My table created in cassandra is:

CREATE TABLE uima_solr.person_annotator (
    id int PRIMARY KEY,
    apellido text,
    nombre text,
    nombrecompleto text,
    solr_query text,
    uimaname set<text>

The solr core use the schema:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<schema name="autoSolrSchema" version="1.5">
  <types>
    <fieldType class="org.apache.solr.schema.TextField" name="TextField">
      <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>
    <fieldType class="org.apache.solr.schema.TrieIntField" name="TrieIntField"/>
  </types>
  <fields>
    <field indexed="true" multiValued="false" name="nombrecompleto" stored="true" type="TextField"/>
    <field indexed="true" multiValued="false" name="apellido" stored="true" type="TextField"/>
    <field indexed="true" multiValued="false" name="nombre" stored="true" type="TextField"/>
    <field docValues="true" indexed="true" multiValued="false" name="id" stored="true" type="TrieIntField"/>
    <field indexed="true" multiValued="false" name="all" stored="false" type="TextField"/>
    <field indexed="true" multiValued="true" name="uimaname" stored="true" type="TextField"/>
  </fields>
  <uniqueKey>id</uniqueKey>
  <defaultSearchField>all</defaultSearchField>
  <copyField source="nombrecompleto" dest="all"/>
  <copyField source="apellido" dest="all"/>
  <copyField source="nombre" dest="all"/>
</schema>

And the solr_config is the following:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<config>
  <abortOnConfigurationError>${solr.abortOnConfigurationError:true}</abortOnConfigurationError>
  <luceneMatchVersion>LUCENE_6_0_0</luceneMatchVersion>
  <dseTypeMappingVersion>2</dseTypeMappingVersion>
  <directoryFactory class="solr.StandardDirectoryFactory" name="DirectoryFactory"/>
  <indexConfig>
    <rt>false</rt>
    <useCompoundFile>false</useCompoundFile>
    <ramBufferSizeMB>512</ramBufferSizeMB>
    <mergeFactor>10</mergeFactor>
    <reopenReaders>true</reopenReaders>
    <deletionPolicy class="solr.SolrDeletionPolicy">
      <str name="maxCommitsToKeep">1</str>
      <str name="maxOptimizedCommitsToKeep">0</str>
    </deletionPolicy>
    <infoStream file="INFOSTREAM.txt">false</infoStream>
  </indexConfig>
  <jmx/>
  <updateHandler class="solr.DirectUpdateHandler2">
    <autoSoftCommit>
      <maxTime>10000</maxTime>
    </autoSoftCommit>
  </updateHandler>
  <query>
    <maxBooleanClauses>1024</maxBooleanClauses>
    <filterCache class="solr.SolrFilterCache" highWaterMarkMB="256" lowWaterMarkMB="128"/>
    <enableLazyFieldLoading>true</enableLazyFieldLoading>
    <useColdSearcher>true</useColdSearcher>
    <maxWarmingSearchers>16</maxWarmingSearchers>
  </query>
  <requestDispatcher handleSelect="true">
    <requestParsers enableRemoteStreaming="true" multipartUploadLimitInKB="2048000"/>
    <httpCaching never304="true"/>
  </requestDispatcher>
  <requestHandler class="solr.SearchHandler" default="true" name="search">
    <lst name="defaults">
      <int name="rows">10</int>
    </lst>
  </requestHandler>
  <requestHandler class="com.datastax.bdp.search.solr.handler.component.CqlSearchHandler" name="solr_query">
    <lst name="defaults">
      <int name="rows">10</int>
    </lst>
  </requestHandler>
  <!--<requestHandler class="solr.UpdateRequestHandler" name="/update"/>-->
  <requestHandler class="solr.UpdateRequestHandler" name="/update/csv" startup="lazy"/>
  <requestHandler class="solr.UpdateRequestHandler" name="/update/json" startup="lazy"/>
  <requestHandler class="solr.FieldAnalysisRequestHandler" name="/analysis/field" startup="lazy"/>
  <requestHandler class="solr.DocumentAnalysisRequestHandler" name="/analysis/document" startup="lazy"/>
  <requestHandler class="solr.admin.AdminHandlers" name="/admin/"/>
  <requestHandler class="solr.PingRequestHandler" name="/admin/ping">
    <lst name="invariants">
      <str name="qt">search</str>
      <str name="q">solrpingquery</str>
    </lst>
    <lst name="defaults">
      <str name="echoParams">all</str>
    </lst>
  </requestHandler>
  <requestHandler class="solr.DumpRequestHandler" name="/debug/dump">
    <lst name="defaults">
      <str name="echoParams">explicit</str>
      <str name="echoHandler">true</str>
    </lst>
  </requestHandler>
  <admin>
    <defaultQuery>*:*</defaultQuery>
  </admin>



  <updateRequestProcessorChain default="true" name="uima">
    <processor class="org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory">
      <lst name="uimaConfig">
        <lst name="runtimeParameters"></lst>
        <!-- Under $SOLR_HOME/solr/example-->
        <str name="analysisEngine">desc/descPersonAnnotator.xml</str>
        <bool name="ignoreErrors">false</bool>
        <lst name="analyzeFields">
          <bool name="merge">false</bool>
          <arr name="fields">
            <str>nombrecompleto</str>
          </arr>
        </lst>
        <lst name="fieldMappings">
          <lst name="type">
            <str name="name">org.apache.uima.annotator.person</str>
            <lst name="mapping">
              <str name="feature">name</str>
              <str name="field">uimaname</str>
            </lst>
          </lst>
        </lst>
      </lst>
    </processor>
    <processor class="solr.LogUpdateProcessorFactory"/>
    <processor class="solr.RunUpdateProcessorFactory"/>
  </updateRequestProcessorChain>
  <requestHandler class="solr.UpdateRequestHandler" name="/update">
    <lst name="defaults">
      <str name="update.processor">uima</str>
    </lst>
  </requestHandler>


</config>

When I insert data using CQL, the data is been indexed correctly in Lucene, and the search works fine but the uima annotator is not working. Also when I upload documents to solr trought Sol command, the document is indexed succesfully and the search also work in CQL but the uima neither works. I checked the logs and it doesn't show any error.

I made the same procedure in the Solr Apache distribution (v6.0.1) and it works as expected.

I'm not seeing the core files in DSE_HOME to edit files as convinience, and I haven't made UIMA integreation successfully in DSE Search, what am I missing in the core config on DSE Search?


Solution

  • Found the answer by not following the Solr documentation procedure, instead use the DSE documentation, specifically the Update request processor and field transformer (FIT) [2017-07-17].

    An example can be found in DSE here [2017-07-17]

    The procedure form the link above shows how to map fields while indexing. By creating a jar fitting the UIMA project to the FIT project and adding the classes to solrconfig.xml

    This procedure allow for a search to be done with Solr command and cql with solr_query over the added metadata.