solrsolrcloud

Apache Solr - Default Schema Configuration


I have written below an example default field from the managed-schema.xml file. What I observed is that generally people use classes such as solr.LowerCaseFilterFactory etc., but in the field below, for example, there is a filter called lowercase without a class. So, is this configuration actively working, or is it just a template?

<fieldType name="text_en" class="solr.TextField" positionIncrementGap="100"/>
  <analyzer type="index"/>
    <tokenizer class="standard"/>
    <filter name="stop" ignoreCase="true" words="lang/stopwords_en.txt"/>
    <filter name="lowercase"/>
    <filter name="englishPossessive"/>
    <filter protected="protwords.txt" name="keywordMarker"/>
    <filter name="porterStem"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="standard"/>
    <filter name="synonymGraph" expand="true" ignoreCase="true" synonyms="synonyms.txt"/>
    <filter name="stop" ignoreCase="true" words="lang/stopwords_en.txt"/>
    <filter name="lowercase"/>
    <filter name="englishPossessive"/>
    <filter protected="protwords.txt" name="keywordMarker"/>
    <filter name="porterStem"/>
  </analyzer>
</fieldType>

Solution

  • It depends on which version of Solr you're using; later versions are able to look up the class name from the short form (i.e. without the FilterFactory postfix. See the example in the current reference guide:

    <fieldType name="text" class="solr.TextField">
      <analyzer>
        <tokenizer name="standard"/>
        <filter name="lowercase"/>
        <filter name="englishPorter"/>
      </analyzer>
    </fieldType>
    

    Compared to the legacy format shown in the same guide:

    <fieldType name="text" class="solr.TextField">
      <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPorterFilterFactory"/>
      </analyzer>
    </fieldType>
    

    As you can see there's just a lot of repetition in the class names given, so instead of having the complete class name, Solr resolves it based on the common pattern and the type given instead.