Document document = new Document();
document.add(new Field("ID", "100", Field.Store.YES, Field.Index.NOT_ANALYZED));
document.add(new Field("TEMPLATE_CONTENT", "dummy Just {#var#} testing a spaceless {#var#} setup dummy",
Field.Store.YES, Field.Index.ANALYZED));
writer.addDocument(document);
am indexing dummy Just {#var#} testing a spaceless {#var#} setup dummy using lucene while am querying am using below spaceless sentance
dummyJustatestingaspacelessfreakingsetupdummy
or
dummyjustatestingaspacelessfreakingsetupdummy
am not able to get a single match with above TEMPLATE_CONTENT
using the below code to serch
query = new QueryParser(Version.LUCENE_36, "TEMPLATE_CONTENT", new StandardAnalyzer(Version.LUCENE_36))
.parse(serchQuery);
searcher = new IndexSearcher(index, true);
System.out.println("......query : " + query + "\n");
long startTime = System.currentTimeMillis();
results = searcher.search(query, 2);
long endTime = System.currentTimeMillis();
System.out.println("results time taken" + (endTime - startTime) + " ms");
for (ScoreDoc scoreDoc : results.scoreDocs) {
System.out.println("scoreDoc : " + scoreDoc);
Document document = searcher.doc(scoreDoc.doc);
System.out.println("Found match: " + document.get("TEMPLATE_CONTENT") + "\n");}
Please help me to get at lease one match
Could you follow this approach and see if this helps?
To ensure that you can match spaceless sentences during searching, you need to analyze and index the text in a way that preserves the spaceless format. One way to achieve this is to use a custom analyzer that doesn't tokenize on whitespace.
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.Tokenizer;
import org.apache.lucene.analysis.core.LowerCaseFilter;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
public class SpacelessAnalyzer extends Analyzer {
@Override
protected TokenStreamComponents createComponents(String fieldName) {
Tokenizer tokenizer = new SpacelessTokenizer();
TokenStream filter = new LowerCaseFilter(tokenizer);
return new TokenStreamComponents(tokenizer, filter);
}
private static class SpacelessTokenizer extends Tokenizer {
private final CharTermAttribute termAtt = addAttribute(CharTermAttribute.class);
@Override
public boolean incrementToken() {
clearAttributes();
try {
// Read the entire input as a single token
char[] buffer = new char[256];
int length = input.read(buffer);
if (length > 0) {
termAtt.append(buffer, 0, length);
return true;
}
} catch (Exception e) {
// handle catch
}
return false;
}
}
}
Now you can use the analyzer when indexing your document:
Analyzer analyzer = new SpacelessAnalyzer();
document.add(new Field("TEMPLATE_CONTENT", "dummy Just {#var#} testing a spaceless {#var#} setup dummy",
Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.YES));
and when searching:
QueryParser queryParser = new QueryParser(Version.LUCENE_36, "TEMPLATE_CONTENT", new SpacelessAnalyzer());
Query query = queryParser.parse(searchQuery);
With this, you should now be able to index and search spaceless sentences