
Solr dynamic field blowing up the index size

Recently, I upgraded from solr 5.0 to solr 6.4.1. I can run my app fine but the problem is that index size with solr 6 is way too large. In solr 5, index size was about 15GB and in solr 6, for the same data, the index size is 300GB! I am not able to understand what contributes to such huge difference in solr 6.

I have been able to identify a field which is blowing up the size of index. It is as follows.

<dynamicField name="*_note" type="text_general" indexed="true" stored="true" multiValued="true"  />

<field name="textproperty" type="text_general" indexed="true" stored="false" multiValued="true"  />
<copyField source="*_note" dest="textproperty"/>

When this field is commented out, the index size reduces to less than 10GB.

This field is of type text_general. Following is the definition of this type.

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <charFilter class="solr.HTMLStripCharFilterFactory" />
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="((?m)[a-z]+)'s" replacement="$1s" />
        <filter class="solr.WordDelimiterFilterFactory" protected="protwords.txt" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>
        <filter class="solr.KStemFilterFactory" /> 
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="C:/Users/pratik/Desktop/solr-6.4.1_playground/solr-6.4.1/server/solr/collection1/conf/stopwords.txt" />
      <analyzer type="query">
        <charFilter class="solr.HTMLStripCharFilterFactory" />
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="((?m)[a-z]+)'s" replacement="$1s" />
        <filter class="solr.WordDelimiterFilterFactory" protected="protwords.txt" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>
        <filter class="solr.KStemFilterFactory" /> 
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="C:/Users/pratik/Desktop/solr-6.4.1_playground/solr-6.4.1/server/solr/collection1/conf/stopwords.txt" />

Few things which I did to debug this issue:

Any idea what could increase the size of index by so much in solr 6?


  • For anyone facing similar issue. The issue for me was that the field which caused index size to be increased disproportionately had a field type("text_general") for which default value of omitNorms was not true. Turning it on explicitly on field fixed the problem. Following is the link to my related question in solr mailing list.