luceneanalyzerquery-parser

Lucene NOT_ANALYZED not working with uppercase characters


I have build an index using a StandardAnalyzer, in this index are a few fields. For example purposes, imagine it has Id and Type. Both are NON_ANALYZED, meaning you can only search for them as-is.

There are a few entries in my index:

 {Id: "1", Type: "Location"},
 {Id: "2", Type: "Group"},
 {Id: "3", Type: "Location"}

When I search for +Id:1 or any other number, I get the appropriate result (again using StandardAnalyzer).

However, when I search for +Type:Location or the +Type:Group, I'm not getting any results. The strange thing is that when I enable leading wildcards, that +Type:*ocation does return results! +Type:*Location or other combinations do not.

This got me leading to believe the indexer/query doesn't like uppercase characters! After lowercasing the Type to location and group before indexing them, I could search for them as such.

If I turn the Type-field to ANALYZED, it works with pretty much any search (uppercase/lowercase, etc), but I want to query for the Type-field as-is.

I'm completely baffled why it's doing this. Could anyone explain to me why my indexer doesn't let me search for NON_ANALYZED fields that have a capital in their value?


Solution

  • Are you using StandardAnalyzer when parsing your your query string (+Type:Location)? The StandardAnalyzer will lower-case all terms, so you're really searching with +Type:location.

    Always use the same analyzer when searching and indexing. Look into using the PerFieldAnalyzer and set the Type field to use the KeywordAnalyzer.