I am using compass based indexing on my project. My annotation based configuration for the field 'name' is :
@SearchableProperty(name="name")
@SearchableMetaData(name="ordering_name", index=Index.NOT_ANALYZED)
private String name;
Now following values are store for 'name' field :
1. Temp 0 New n/a
2. e/f search
3. c/d search
Now the search result with difference scenarios is as follows :
1. 'c/d' -> +(+alias:TempClass +(c/d*)) +(alias:TempClass) -> 1 record found
2. 'n/a' -> +(+alias:TempClass +(n/a*)) +(alias:TempClass) -> 0 record found
3. 'search' -> +(+alias:TempClass +(search*)) +(alias:TempClass) -> 2 records found
So when I am trying to search 'n/a', it should search the first record with value 'Temp 0 New n/a'.
Any help would be highly appreciated !!!
At some point your query analysis doesn't match with your Document analysis.
Most likely you are internally using Lucene's StandardAnalyzer on the query parsing but not at index time, as detonated by:
@SearchableMetaData(name="ordering_name", index=Index.NOT_ANALYZED))
The StandardTokenizer used inside this analyzer considers the character /
as a word boundary (such as space would be), producing the tokens n
and a
. Later on, the token a
is removed by a StopFilter.
The following code is an example for this explanation (the input is "c/d e/f n/a"
):
Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_36);
TokenStream tokenStream = analyzer.tokenStream("CONTENT", new StringReader("c/d e/f n/a"));
CharTermAttribute term = tokenStream.getAttribute(CharTermAttribute.class);
PositionIncrementAttribute position = tokenStream.getAttribute(PositionIncrementAttribute.class);
int pos = 0;
while (tokenStream.incrementToken()) {
String termStr = term.toString();
int incr = position.getPositionIncrement();
if (incr == 0 ) {
System.out.print(" [" + termStr + "]");
} else {
pos += incr;
System.out.println(" " + pos + ": [" + termStr +"]");
}
}
You'll see the following extracted tokens:
1: [c]
2: [d]
3: [e]
4: [f]
5: [n]
Notice that the expected position 6: with token a
is missing. As you can see, Lucene's QueryParser also performs this tokenization:
QueryParser parser = new QueryParser(Version.LUCENE_36, "content", new StandardAnalyzer(Version.LUCENE_36));
System.out.println(parser.parse("+n/a*"));
The output is:
+content:n
EDIT: The solution would be to use WhitespaceAnalyzer, and set the field to ANALYZED. The following code is a proof of concept under Lucene:
IndexWriter writer = new IndexWriter(new RAMDirectory(), new IndexWriterConfig(Version.LUCENE_36, new WhitespaceAnalyzer(Version.LUCENE_36)));
Document doc = new Document();
doc.add(new Field("content","Temp 0 New n/a", Store.YES, Index.ANALYZED));
writer.addDocument(doc);
writer.commit();
IndexReader reader = IndexReader.open(writer, true);
IndexSearcher searcher = new IndexSearcher(reader);
BooleanQuery query = new BooleanQuery();
QueryParser parser = new QueryParser(Version.LUCENE_36, "content", new WhitespaceAnalyzer(Version.LUCENE_36));
TopDocs docs = searcher.search(parser.parse("+n/a"), 10);
System.out.println(docs.totalHits);
writer.close();
The output is: 1
.