linuxjdbcsolrdih

Solr on a Linux Host: Issue with the data Import Handler


I am working on indexing a database on SQL SERVER 2016 with Solr Data Import Handler. I am currently working on solr-8.6.3.

I was initially working on windows 10, in standalone mode, I had configured a schema, solrconfig, and core-data-config (for the dih). I uploaded the *jar file that were necessary to make work the dih.

On windows 10, in localhost there was no problem, the connection to the database was established, the data was collected correctly.

But then I wanted to take solr to production and run solr instance on a Linux host (Debian) using putty from my windows computer. I am beginer in linux but I managed to make my server solr work. I put my *jar file (mssql-jdbc-8.4.1.jre14) in the lib folder in order to make work my DIH.

I create my core with this command :

sudo -u solr /opt/solr-8.6.3/bin/solr create -c name_core -d core-data-configs

But when I try to do the full import nothing happen Request:0 Fetched:0 Skipped:0 Processed:0. But I have no error in my log, no "could not load jdbc driver". My log in solr are empty, nothing suspicious or unusual. But clearly solr doesn't reach my sql server.

Here are the schema:

<schema name="oriente_objet" version="1.6">

<fields>
   <field name="_version_" type="plong" indexed="true" stored="false" multiValued="false"/>
   
   <!-- points to the root document of a block of nested documents. Required for nested
      document support, may be removed otherwise
   -->
   <field name="_root_" type="string" indexed="true" stored="false"/>

   <!-- Only remove the "id" field if you have a very good reason to. While not strictly
     required, it is highly recommended. A <uniqueKey> is present in almost all Solr 
     installations. See the <uniqueKey> declaration below where <uniqueKey> is set to "id".
   -->   
   <field name="id" type="string" indexed="true" stored="true" required="false" multiValued="false" /> 
  
   <field name="ID_S_Object" type="string" indexed="true" stored="true"/> 
   <field name="sNameObject" type="string" indexed="true" stored="true"/>
   <field name="sString" type="string" indexed="true" stored="false" multiValued="true"/>
   <field name="sText" type="string" indexed="true" stored="false" multiValued="true"/>
   <field name="sURL" type="string" indexed="true" stored="false" multiValued="true"/>
   <field name="sFile" type="string" indexed="true" stored="false" multiValued="true"/>
   <field name="sObjectSource" type="string" indexed="true" stored="false" multiValued="true"/>
   <field name="sObjectDestination" type="string" indexed="true" stored="false" multiValued="true"/>
   <field name="sCommentaire" type="string" indexed="true" stored="false" multiValued="true" />
   <field name="sSerieName" type="string" indexed="true" stored="false" multiValued="true" />
   <field name="sSerieValue" type="string" indexed="true" stored="false" multiValued="true" />
   <field name="SumDayDistanceE" type="pdouble" indexed="true" stored="false" />
   <field name="SumDayDistanceN" type="pdouble" indexed="true" stored="false" />
   
   <field name="suggest_field" type="textSuggest" indexed="false" stored="true" multiValued="true" />
   

   <field name="content" type="text_general" indexed="false" stored="true" multiValued="true"/>
   

   
   <field name="text" type="text_general" indexed="true" stored="false" multiValued="true"/>

   
   <copyField source="sNameObject" dest="text"/>
   <copyField source="sNameObject" dest="suggest_field"/>
   <copyField source="sString" dest="text"/>
   <copyField source="sString" dest="suggest_field"/>  
   <copyField source="sObjectSource" dest="suggest_field"/>
   <copyField source="sObjectDestination" dest="suggest_field"/>
   <copyField source="sText" dest="suggest_field"/>
   <copyField source="sURL" dest="suggest_field"/>
   <copyField source="sFile" dest="suggest_field"/>
   <copyField source="sCommentaire" dest="suggest_field"/>
   <copyField source="sSerieName" dest="suggest_field"/>
   <copyField source="sSerieValue" dest="suggest_field"/>
   <copyField source="sObjectSource" dest="text"/>
   <copyField source="sObjectDestination" dest="text"/>
   <copyField source="sText" dest="text"/>
   <copyField source="sURL" dest="text"/>
   <copyField source="sFile" dest="text"/>
   <copyField source="sCommentaire" dest="text"/>
   <copyField source="sSerieName" dest="text"/>
   <copyField source="sSerieValue" dest="text"/>



  

   </fields>
  
 
 <uniqueKey>id</uniqueKey> 

    <fieldType name="string" class="solr.StrField" sortMissingLast="true"/> 
    

    
    <fieldType name="boolean" class="solr.BoolField" sortMissingLast="true"/>

   
    <fieldType name="pint" class="solr.IntPointField" docValues="true"/>
    <fieldType name="pfloat" class="solr.FloatPointField" docValues="true"/>
    <fieldType name="plong" class="solr.LongPointField" docValues="true"/>
    <fieldType name="pdouble" class="solr.DoublePointField" docValues="true"/>
    
    <fieldType name="pints" class="solr.IntPointField" docValues="true" multiValued="true"/>
    <fieldType name="pfloats" class="solr.FloatPointField" docValues="true" multiValued="true"/>
    <fieldType name="plongs" class="solr.LongPointField" docValues="true" multiValued="true"/>
    <fieldType name="pdoubles" class="solr.DoublePointField" docValues="true" multiValued="true"/>

  
    <fieldType name="pdate" class="solr.DatePointField" docValues="true"/>
    <fieldType name="pdates" class="solr.DatePointField" docValues="true" multiValued="true"/>
    
    <!--Binary data type. The data should be sent/retrieved in as Base64 encoded Strings -->
    <fieldType name="binary" class="solr.BinaryField"/>

  
    <fieldType name="random" class="solr.RandomSortField" indexed="true" />

    
    <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
    
   
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>
    
    <fieldType class="solr.TextField" name="textSuggest" positionIncrementGap="100">
        <analyzer>
            <tokenizer class="solr.StandardTokenizerFactory"/>
                <filter class="solr.LowerCaseFilterFactory"/>
        </analyzer>
    </fieldType>

 
    <fieldType name="ignored" stored="false" indexed="false" multiValued="true" class="solr.StrField" />

    
   
   
</schema>

the solr config:


<config>

<luceneMatchVersion>8.6.2</luceneMatchVersion>

<lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-dataimporthandler-.*\.jar"/>
<lib dir="${solr.install.dir:../../../..}/contrib/extraction/lib" regex=".*\.jar"/>
<lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-cell-\d.*\.jar"/>
<lib dir="${solr.install.dir:../../../..}/contrib/langid/lib/" regex=".*\.jar"/>
<lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-langid-\d.*\.jar"/>
<lib dir="${solr.install.dir:../../../..}/contrib/velocity/lib" regex=".*\.jar"/>
<lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-velocity-\d.*\.jar"/>
<lib dir="${solr.install.dir:../../../..}/lib/" regex=".*\.jar" /> 



<dataDir>${solr.data.dir:}</dataDir>

<directoryFactory name="DirectoryFactory" class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}"/>

<schemaFactory class="ClassicIndexSchemaFactory">
</schemaFactory>

<indexConfig>

<lockType>${solr.lock.type:native}</lockType>

<infoStream>true</infoStream>
</indexConfig>

<jmx/>

<updateHandler class="solr.DirectUpdateHandler2">

<updateLog>
<str name="dir">${solr.ulog.dir:}</str>
</updateLog>

<autoCommit>
<maxTime>${solr.autoCommit.maxTime:15000}</maxTime>
<openSearcher>false</openSearcher>
</autoCommit>

<autoSoftCommit>
<maxTime>${solr.autoSoftCommit.maxTime:-1}</maxTime>
</autoSoftCommit>

</updateHandler>

<query>

<maxBooleanClauses>${solr.max.booleanClauses:1024}</maxBooleanClauses>

<filterCache class="solr.FastLRUCache" size="512" initialSize="512" autowarmCount="0"/>

<queryResultCache class="solr.LRUCache" size="512" initialSize="512" autowarmCount="0"/>

<documentCache class="solr.LRUCache" size="512" initialSize="512" autowarmCount="0"/>

<cache name="perSegFilter" class="solr.search.LRUCache" size="10" initialSize="0" autowarmCount="10" regenerator="solr.NoOpRegenerator"/>

<enableLazyFieldLoading>true</enableLazyFieldLoading>

<queryResultWindowSize>20</queryResultWindowSize>

<queryResultMaxDocsCached>200</queryResultMaxDocsCached>

<listener event="newSearcher" class="solr.QuerySenderListener">
<arr name="queries">

</arr>
</listener>
<listener event="firstSearcher" class="solr.QuerySenderListener">
<arr name="queries">
<lst>
<str name="q">static firstSearcher warming in solrconfig.xml</str>
</lst>
</arr>
</listener>

<useColdSearcher>false</useColdSearcher>
</query>

<requestDispatcher>

<httpCaching never304="true"/>

</requestDispatcher>

<requestHandler name="/dataimport" class="solr.DataImportHandler">
<lst name="defaults">
<str name="config">suggest-data-config.xml</str>
</lst>
</requestHandler>  


<requestHandler name="dismax" class="solr.SearchHandler">
<lst name="defaults">
<str name="defType">dismax</str>
<str name="echoParams">explicit</str>
</lst>
</requestHandler>


<requestHandler name="/select" class="solr.SearchHandler">

<lst name="defaults">
<str name="echoParams">explicit</str>
<int name="rows">10</int>
<str name="df">text</str>

          
          
          
</lst>

</requestHandler>
<requestHandler name="/query" class="solr.SearchHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<str name="wt">json</str>
<str name="indent">true</str>
<str name="df">text</str>
</lst>
</requestHandler>

<requestHandler name="/browse" class="solr.SearchHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<!--  VelocityResponseWriter settings  -->
<str name="wt">velocity</str>
<str name="v.template">browse</str>
<str name="v.layout">layout</str>
<!--  Query settings  -->
<str name="defType">edismax</str>
<str name="q.alt">*:*</str>
<str name="rows">10</str>
<str name="fl">*,score</str>
<!--  Faceting defaults  -->
<str name="facet">on</str>
<str name="facet.mincount">1</str>
</lst>
</requestHandler>
<initParams path="/update/**,/query,/select,/tvrh,/elevate,/spell,/browse">
<lst name="defaults">
<str name="df">text</str>
</lst>
</initParams>

<requestHandler name="/update/extract" startup="lazy" class="solr.extraction.ExtractingRequestHandler">
<lst name="defaults">
<str name="lowernames">true</str>
<str name="uprefix">ignored_</str>
<!--  capture link hrefs but ignore div attributes  -->
<str name="captureAttr">true</str>
<str name="fmap.a">links</str>
<str name="fmap.div">ignored_</str>
</lst>
</requestHandler>

<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<str name="queryAnalyzerFieldType">text_general</str>
<!--  Multiple "Spell Checkers" can be declared and used by this
         component
       -->
<!--  a spellchecker built from a field of the main index  -->
<lst name="spellchecker">
<str name="name">default</str>
<str name="field">text</str>
<str name="classname">solr.DirectSolrSpellChecker</str>
<!--  the spellcheck distance measure used, the default is the internal levenshtein  -->
<str name="distanceMeasure">internal</str>
<!--  minimum accuracy needed to be considered a valid spellcheck suggestion  -->
<float name="accuracy">0.5</float>
<!--  the maximum #edits we consider when enumerating terms: can be 1 or 2  -->
<int name="maxEdits">2</int>
<!--  the minimum shared prefix when enumerating terms  -->
<int name="minPrefix">1</int>
<!--  maximum number of inspections per result.  -->
<int name="maxInspections">5</int>
<!--  minimum length of a query term to be considered for correction  -->
<int name="minQueryLength">4</int>
<!--  maximum threshold of documents a query term can appear to be considered for correction  -->
<float name="maxQueryFrequency">0.01</float>
<!--  uncomment this to require suggestions to occur in 1% of the documents
        <float name="thresholdTokenFrequency">.01</float>
       -->
</lst>
<!--  a spellchecker that can break or combine words.  See "/spell" handler below for usage  -->
<lst name="spellchecker">
<str name="name">wordbreak</str>
<str name="classname">solr.WordBreakSolrSpellChecker</str>
<str name="field">name</str>
<str name="combineWords">true</str>
<str name="breakWords">true</str>
<int name="maxChanges">10</int>
</lst>

</searchComponent>

<requestHandler name="/spell" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="df">text</str>
<!--  Solr will use suggestions from both the 'default' spellchecker
           and from the 'wordbreak' spellchecker and combine them.
           collations (re-written queries) can include a combination of
           corrections from both spellcheckers  -->
<str name="spellcheck.dictionary">default</str>
<str name="spellcheck.dictionary">wordbreak</str>
<str name="spellcheck">on</str>
<str name="spellcheck.extendedResults">true</str>
<str name="spellcheck.count">10</str>
<str name="spellcheck.alternativeTermCount">5</str>
<str name="spellcheck.maxResultsForSuggest">5</str>
<str name="spellcheck.collate">true</str>
<str name="spellcheck.collateExtendedResults">true</str>
<str name="spellcheck.maxCollationTries">10</str>
<str name="spellcheck.maxCollations">5</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">mySuggester</str>
<str name="lookupImpl">AnalyzingInfixLookupFactory</str>
<!--  org.apache.solr.spelling.suggest.fst  -->
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<!--  org.apache.solr.spelling.suggest.HighFrequencyDictionaryFactory  -->
<str name="field">suggest_field</str>
<str name="weightField">price</str>
<str name="suggestAnalyzerFieldType">textSuggest</str>
</lst>
</searchComponent>

<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="suggest">true</str>
<str name="suggest.count">10</str>
<str name="suggest.dictionary">mySuggester</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
<!--  Term Vector Component

       http://wiki.apache.org/solr/TermVectorComponent
     -->
<searchComponent name="tvComponent" class="solr.TermVectorComponent"/>

<requestHandler name="/tvrh" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="df">text</str>
<bool name="tv">true</bool>
</lst>
<arr name="last-components">
<str>tvComponent</str>
</arr>
</requestHandler>

<searchComponent name="terms" class="solr.TermsComponent"/>
<!--  A request handler for demonstrating the terms component  -->
<requestHandler name="/terms" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<bool name="terms">true</bool>
<bool name="distrib">false</bool>
</lst>
<arr name="components">
<str>terms</str>
</arr>
</requestHandler>

<searchComponent name="elevator" class="solr.QueryElevationComponent">
<!--  pick a fieldType to analyze queries  -->
<str name="queryFieldType">string</str>
<str name="config-file">elevate.xml</str>
</searchComponent>
<!--  A request handler for demonstrating the elevator component  -->
<requestHandler name="/elevate" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="echoParams">explicit</str>
<str name="df">text</str>
</lst>
<arr name="last-components">
<str>elevator</str>
</arr>
</requestHandler>
<!--  Highlighting Component

       http://wiki.apache.org/solr/HighlightingParameters
     -->
<searchComponent class="solr.HighlightComponent" name="highlight">
<highlighting>
<!--  Configure the standard fragmenter  -->
<!--  This could most likely be commented out in the "default" case  -->
<fragmenter name="gap" default="true" class="solr.highlight.GapFragmenter">
<lst name="defaults">
<int name="hl.fragsize">100</int>
</lst>
</fragmenter>
<!--  A regular-expression-based fragmenter
           (for sentence extraction)
         -->
<fragmenter name="regex" class="solr.highlight.RegexFragmenter">
<lst name="defaults">
<!--  slightly smaller fragsizes work better because of slop  -->
<int name="hl.fragsize">70</int>
<!--  allow 50% slop on fragment sizes  -->
<float name="hl.regex.slop">0.5</float>
<!--  a basic sentence pattern  -->
<str name="hl.regex.pattern">[-\w ,/\n\"']{20,200}</str>
</lst>
</fragmenter>
<!--  Configure the standard formatter  -->
<formatter name="html" default="true" class="solr.highlight.HtmlFormatter">
<lst name="defaults">
<str name="hl.simple.pre">
<![CDATA[ <em> ]]>
</str>
<str name="hl.simple.post">
<![CDATA[ </em> ]]>
</str>
</lst>
</formatter>
<!--  Configure the standard encoder  -->
<encoder name="html" class="solr.highlight.HtmlEncoder"/>
<!--  Configure the standard fragListBuilder  -->
<fragListBuilder name="simple" class="solr.highlight.SimpleFragListBuilder"/>
<!--  Configure the single fragListBuilder  -->
<fragListBuilder name="single" class="solr.highlight.SingleFragListBuilder"/>
<!--  Configure the weighted fragListBuilder  -->
<fragListBuilder name="weighted" default="true" class="solr.highlight.WeightedFragListBuilder"/>
<!--  default tag FragmentsBuilder  -->
<fragmentsBuilder name="default" default="true" class="solr.highlight.ScoreOrderFragmentsBuilder">
<!-- 
        <lst name="defaults">
          <str name="hl.multiValuedSeparatorChar">/</str>
        </lst>
         -->
</fragmentsBuilder>
<!--  multi-colored tag FragmentsBuilder  -->
<fragmentsBuilder name="colored" class="solr.highlight.ScoreOrderFragmentsBuilder">
<lst name="defaults">
<str name="hl.tag.pre">
<![CDATA[ <b style="background:yellow">,<b style="background:lawgreen">, <b style="background:aquamarine">,<b style="background:magenta">, <b style="background:palegreen">,<b style="background:coral">, <b style="background:wheat">,<b style="background:khaki">, <b style="background:lime">,<b style="background:deepskyblue"> ]]>
</str>
<str name="hl.tag.post">
<![CDATA[ </b> ]]>
</str>
</lst>
</fragmentsBuilder>
<boundaryScanner name="default" default="true" class="solr.highlight.SimpleBoundaryScanner">
<lst name="defaults">
<str name="hl.bs.maxScan">10</str>
<str name="hl.bs.chars">.,!? </str>
</lst>
</boundaryScanner>
<boundaryScanner name="breakIterator" class="solr.highlight.BreakIteratorBoundaryScanner">
<lst name="defaults">
<!--  type should be one of CHARACTER, WORD(default), LINE and SENTENCE  -->
<str name="hl.bs.type">WORD</str>

<str name="hl.bs.language">en</str>
<str name="hl.bs.country">US</str>
</lst>
</boundaryScanner>
</highlighting>
</searchComponent>

<queryResponseWriter name="json" class="solr.JSONResponseWriter">

<str name="content-type">text/plain; charset=UTF-8</str>
</queryResponseWriter>
<!-- 
     Custom response writers can be declared as needed...
     -->
<queryResponseWriter name="velocity" class="solr.VelocityResponseWriter" startup="lazy">
<str name="template.base.dir">${velocity.template.base.dir:}</str>
</queryResponseWriter>

<queryResponseWriter name="xslt" class="solr.XSLTResponseWriter">
<int name="xsltCacheLifetimeSeconds">5</int>
</queryResponseWriter>

</config>

and finally the core-data-config (I think that only the connection to the sql server is important here)

<dataSource type="JdbcDataSource"   driver="com.microsoft.sqlserver.jdbc.SQLServerDriver" url="jdbc:sqlserver://VTEST06\SQL2016;database=name_data;user=sa;password=xxx" />

I already check on stackoverflow and other instance but i found nothing familiar with this (only about mysql connection). I supposed that the jdbc doesn't work the same on a linux host? I am confused and don't know how to unlock myself

I try to check on /var/log but i don't find the right directory.

Maybe I have to install software, or some other thing in my Linux host.

I already try to move my *jar file in case it was because of this (adding in correspondance in solr config this line

<lib dir="${solr.install.dir:../../../..}/[blabla]/" regex=".*\.jar" /> 

If you have any leads/suggestions or encountered this problem please do not hesitate to Thank you for your time


Solution

  • In case someone encounter the same probleme I solve it by using the debug mode in Solr. To do so, I added to the solr.in.sh file located in /etc/default :

    -Denable.dih.dataConfigParam=true 
    

    The debug mode showed that it was the version of my mssql connector which was not adapted to the java version on my linux host