javalucenelucene.netexaminequery-analyzer

How to stop Lucene Standard Analyzer removing special characters


I have been having some difficulty with Lucene and would appreciate any help.

I have a custom query which is manually written and parsed (this query) using QueryParser.Parse. I am using version LUCENE_29 and the StandardAnalyzer.

In my query I have a special character (colon) and need this to remain:

+(Name:"test\:word" OR Business:"test\:word hello")

The output after parsing the query text above is:

+(Name:"test word" OR Business:"test word hello")

Does anyone have any suggestions, I tried passing an empty stop words collection to the StandardAnalyzer constructor but that has no effect it still strips out the colon.

Thank you.


Solution

  • You can't. StandardAnalyzer was specifically designed to remove special characters.

    The answer is to use an Analyzer implementation that doesn't strip special characters (such as WhiteSpaceAnalyzer) or to build a custom analyzer based on existing tokenizers and filters to meet your needs.

    Note that you would need to use WhiteSpaceAnalyzer to index your data with those special characters, otherwise they won't be available at query-time.