solrlucenesolr5solr6

Solr dynamic field blowing up the index size


Recently, I upgraded from solr 5.0 to solr 6.4.1. I can run my app fine but the problem is that index size with solr 6 is way too large. In solr 5, index size was about 15GB and in solr 6, for the same data, the index size is 300GB! I am not able to understand what contributes to such huge difference in solr 6.

I have been able to identify a field which is blowing up the size of index. It is as follows.

<dynamicField name="*_note" type="text_general" indexed="true" stored="true" multiValued="true"  />

<field name="textproperty" type="text_general" indexed="true" stored="false" multiValued="true"  />
<copyField source="*_note" dest="textproperty"/>

When this field is commented out, the index size reduces to less than 10GB.

This field is of type text_general. Following is the definition of this type.

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <charFilter class="solr.HTMLStripCharFilterFactory" />
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="((?m)[a-z]+)'s" replacement="$1s" />
        <filter class="solr.WordDelimiterFilterFactory" protected="protwords.txt" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>
        <filter class="solr.KStemFilterFactory" /> 
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="C:/Users/pratik/Desktop/solr-6.4.1_playground/solr-6.4.1/server/solr/collection1/conf/stopwords.txt" />
      </analyzer>
      <analyzer type="query">
        <charFilter class="solr.HTMLStripCharFilterFactory" />
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="((?m)[a-z]+)'s" replacement="$1s" />
        <filter class="solr.WordDelimiterFilterFactory" protected="protwords.txt" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>
        <filter class="solr.KStemFilterFactory" /> 
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="C:/Users/pratik/Desktop/solr-6.4.1_playground/solr-6.4.1/server/solr/collection1/conf/stopwords.txt" />
      </analyzer>
  </fieldType>

Few things which I did to debug this issue:

Any idea what could increase the size of index by so much in solr 6?


Solution

  • For anyone facing similar issue. The issue for me was that the field which caused index size to be increased disproportionately had a field type("text_general") for which default value of omitNorms was not true. Turning it on explicitly on field fixed the problem. Following is the link to my related question in solr mailing list.

    http://search-lucene.com/m/Solr/eHNlagIB7209f1w1?subj=Fwd+Solr+dynamic+field+blowing+up+the+index+size