so as in the title when I'm trying to search for a query i get an error
Exception in thread "main" java.lang.IllegalStateException: field "content" was indexed without position data; cannot run PhraseQuery (phrase=content:"to be not"~1) at org.apache.lucene.search.PhraseQuery$1.getPhraseMatcher(PhraseQuery.java:497) at org.apache.lucene.search.PhraseWeight.scorer(PhraseWeight.java:64) at org.apache.lucene.search.Weight.bulkScorer(Weight.java:166) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:731) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:655) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:649) at org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:487) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:501) at ProximitySearch.main(ProximitySearch.java:81)
Here is my code:
public static void main(String[] args) throws IOException, ParseException {
Analyzer analyzer = new StandardAnalyzer();
List<KeyValuePairs> listOfDocs = new LinkedList<>();
KeyValuePairs file1 = new KeyValuePairs("file1", "to be or not to be that is the question");
KeyValuePairs file2 = new KeyValuePairs("file2", "make a long story short");
KeyValuePairs file3 = new KeyValuePairs("file3", "see eye to eye");
listOfDocs.add(file1);
listOfDocs.add(file2);
listOfDocs.add(file3);
Path indexPath = Files.createTempDirectory("tempIndex");
Directory directory = FSDirectory.open(indexPath);
IndexWriterConfig config = new IndexWriterConfig(analyzer);
IndexWriter iwriter = new IndexWriter(directory, config);
for (KeyValuePairs listOfDoc : listOfDocs) {
Document doc = new Document();
String text = listOfDoc.getKey();
System.out.println(text);
String title = listOfDoc.getValue();
doc.add(new StringField("content", text, Field.Store.YES));
doc.add(new Field("title", title, TextField.TYPE_STORED));
iwriter.addDocument(doc);
}
iwriter.close();
// Now search the index:
DirectoryReader ireader = DirectoryReader.open(directory);
IndexSearcher isearcher = new IndexSearcher(ireader);
// Parse a simple query that searches for "something that u want to search":
QueryParser parser = new QueryParser("content", analyzer);
Query query = parser.parse("\"to be not\"~1");
ScoreDoc[] hits = isearcher.search(query, 10).scoreDocs;
System.out.println(Arrays.toString(Arrays.stream(hits).toArray()));
System.out.println("Search terms found in :: " + hits.length + " files");
ireader.close();
directory.close();
IOUtils.rm(indexPath);
}
I dont know what am i doing wrong.
Short Answer
You cannot run proximity queries for data stored in a StringField
. You have to use a TextField
.
You did not show us the definition for KeyValuePairs
, so I have made some assumptions below about that.
(Small point: I would also suggest that you do not need to use LinkedList
- you probably only need ArrayList
.)
Longer Answer for More Background
Your problem is related to the field types you are using.
You have a document containing 2 fields:
content
- which uses a StringField
title
- which uses a TextField
.An example of data in the content
field is to be or not to be that is the question
.
You are attempting to run a proximity query against the content
field.
Remember from this question that StringField
data "is indexed but not tokenized: the entire String value is indexed as a single token."
A single token, means the token's position is always effectively the only position - and therefore position data is not captured in the index (it is basically meaningless).
That is why your query throws that error. That query requires the data to be split up into separate tokens - and each token's position needs to be captured in the index.
Therefore you need to use a TextField
for that type of data.
When you use a TextField
for to be or not to be that is the question
, then the StandardAnalyzer
causes the following data to be captured in the index:
field content
term be
doc 0
freq 2
pos 1
pos 5
term is
doc 0
freq 1
pos 7
term not
doc 0
freq 1
pos 3
term or
doc 0
freq 1
pos 2
term question
doc 0
freq 1
pos 9
term that
doc 0
freq 1
pos 6
term the
doc 0
freq 1
pos 8
term to
doc 0
freq 2
pos 0
pos 4
You can see that the index now contains the required position data. The proximity query requires this position data to evaluate whether the words in your query are sufficiently close enough to each other, to match your query.
And just for completeness, here is what you get in the index if you use StringField
instead of TextField
:
doc 0
field 0
name content
type string
value to be or not to be that is the question
As you can see - only one token - and no position data.