I am using solr 8.2.0 . I am trying to configure proximity search in my solr but it doesnt seem to remove the stopwords in query .
<fieldType name="psearch" class="solr.TextField" positionIncrementGap="100" multiValued="true">
<analyzer type="index">
<tokenizer class="solr.ClassicTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.ClassicTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
</analyzer>
</fieldType>
I have mentioned the stopwords in stopwords.txt file in the directory , at the index time solr is removing the words as you can see in the picture : indexed terms
I also checked it in the analysis tab overthere the stopwords are being removed Analysis tab
And here is the field :
<field name="pSearchField" type="psearch" indexed="true" stored="true" multiValued="false" />
<copyField source="example" dest="pSearchField"/>
And when I set the proximity to 1 or 2 or 3 it returns no result : result
This is a known problem with Solr 5 and up, since it no longer rewrites the position for each token when the stopfilter is invoked. This issue, with a few suggestions of how to fix it, is tracked in SOLR-6468.
The easiest solution is to introduce a mapping char filter factory, but I'm skeptical to it changing characters internally in a string. (i.e. "to" => ""
also affecting veto
and not just to
). This can possible be handled with multiple PatternReplaceCharFilterFactories instead.
Another option shown in the thread for the ticket is to use a custom filter that rewrites the position data for each token:
package filters;
import java.io.IOException;
import java.util.Map;
import org.apache.lucene.analysis.TokenFilter;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;
import org.apache.lucene.analysis.util.TokenFilterFactory;
public class RemoveTokenGapsFilterFactory extends TokenFilterFactory {
public RemoveTokenGapsFilterFactory(Map<String, String> args) {
super(args);
}
@Override
public TokenStream create(TokenStream input) {
RemoveTokenGapsFilter filter = new RemoveTokenGapsFilter(input);
return filter;
}
}
final class RemoveTokenGapsFilter extends TokenFilter {
private final PositionIncrementAttribute posIncrAtt = addAttribute(PositionIncrementAttribute.class);
public RemoveTokenGapsFilter(TokenStream input) {
super(input);
}
@Override
public final boolean incrementToken() throws IOException {
while (input.incrementToken()) {
posIncrAtt.setPositionIncrement(1);
return true;
}
return false;
}
}
There currently is no perfect, built-in solution to this issue as far as I know.