Problem with Proximity search Lucene. Field "content" was indexed without position data

so as in the title when I'm trying to search for a query i get an error

Exception in thread "main" java.lang.IllegalStateException: field "content" was indexed without position data; cannot run PhraseQuery (phrase=content:"to be not"~1) at org.apache.lucene.search.PhraseQuery$1.getPhraseMatcher(PhraseQuery.java:497) at org.apache.lucene.search.PhraseWeight.scorer(PhraseWeight.java:64) at org.apache.lucene.search.Weight.bulkScorer(Weight.java:166) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:731) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:655) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:649) at org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:487) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:501) at ProximitySearch.main(ProximitySearch.java:81)

Here is my code:

    public static void main(String[] args) throws IOException, ParseException {

        Analyzer analyzer = new StandardAnalyzer();

        List<KeyValuePairs> listOfDocs = new LinkedList<>();

        KeyValuePairs file1 = new KeyValuePairs("file1", "to be or not to be that is the question");
        KeyValuePairs file2 = new KeyValuePairs("file2", "make a long story short");
        KeyValuePairs file3 = new KeyValuePairs("file3", "see eye to eye");

        listOfDocs.add(file1);
        listOfDocs.add(file2);
        listOfDocs.add(file3);

        Path indexPath = Files.createTempDirectory("tempIndex");
        Directory directory = FSDirectory.open(indexPath);
        IndexWriterConfig config = new IndexWriterConfig(analyzer);
        IndexWriter iwriter = new IndexWriter(directory, config);
        for (KeyValuePairs listOfDoc : listOfDocs) {
            Document doc = new Document();
            String text = listOfDoc.getKey();
            System.out.println(text);
            String title = listOfDoc.getValue();
            doc.add(new StringField("content", text, Field.Store.YES));
            doc.add(new Field("title", title, TextField.TYPE_STORED));
            iwriter.addDocument(doc);
        }
        iwriter.close();

        // Now search the index:
        DirectoryReader ireader = DirectoryReader.open(directory);
        IndexSearcher isearcher = new IndexSearcher(ireader);

        // Parse a simple query that searches for "something that u want to search":
        QueryParser parser = new QueryParser("content", analyzer);
        Query query = parser.parse("\"to be not\"~1");

        ScoreDoc[] hits = isearcher.search(query, 10).scoreDocs;
        System.out.println(Arrays.toString(Arrays.stream(hits).toArray()));
        System.out.println("Search terms found in :: " + hits.length + " files");

        ireader.close();
        directory.close();
        IOUtils.rm(indexPath);
    }

I dont know what am i doing wrong.

Solution

Short Answer

You cannot run proximity queries for data stored in a StringField. You have to use a TextField.

You did not show us the definition for KeyValuePairs, so I have made some assumptions below about that.

(Small point: I would also suggest that you do not need to use LinkedList - you probably only need ArrayList.)

Longer Answer for More Background

Your problem is related to the field types you are using.

You have a document containing 2 fields:

content - which uses a StringField
title - which uses a TextField.

An example of data in the content field is to be or not to be that is the question.

You are attempting to run a proximity query against the content field.

Remember from this question that StringField data "is indexed but not tokenized: the entire String value is indexed as a single token."

A single token, means the token's position is always effectively the only position - and therefore position data is not captured in the index (it is basically meaningless).

That is why your query throws that error. That query requires the data to be split up into separate tokens - and each token's position needs to be captured in the index.

Therefore you need to use a TextField for that type of data.

When you use a TextField for to be or not to be that is the question, then the StandardAnalyzer causes the following data to be captured in the index:

field content
  term be
    doc 0
      freq 2
      pos 1
      pos 5
  term is
    doc 0
      freq 1
      pos 7
  term not
    doc 0
      freq 1
      pos 3
  term or
    doc 0
      freq 1
      pos 2
  term question
    doc 0
      freq 1
      pos 9
  term that
    doc 0
      freq 1
      pos 6
  term the
    doc 0
      freq 1
      pos 8
  term to
    doc 0
      freq 2
      pos 0
      pos 4

You can see that the index now contains the required position data. The proximity query requires this position data to evaluate whether the words in your query are sufficiently close enough to each other, to match your query.

And just for completeness, here is what you get in the index if you use StringField instead of TextField:

doc 0
  field 0
    name content
    type string
    value to be or not to be that is the question

As you can see - only one token - and no position data.