I would like to know, how to use proximity search with the whoosh. I have read the documentation of the whoosh. It was written in the document that by using class whoosh.query.Phrase(fieldname, words, slop=1, boost=1.0, char_ranges=None)
once can able to use the proximity search.
for example, I need to find "Hello World" in the index, but "Hello" should have 5-word distance from the word "World".
As of now, I am using the following code and its working fine with the normal parser.
from whoosh.query import *
from whoosh import qparser
index_path = "/home/abhi/Desktop/CLIR/indexdir_test"
ix = open_dir(index_path)
query='Hello World'
ana = StandardAnalyzer(stoplist=stop_word)
qp = QueryParser("content", schema=ix.schema,termclass=Phrase)
q=qp.parse(query)
with ix.searcher() as s:
results = s.search(qp,limit=5)
for result in results:
print(result['content']+result['title'])
print (result.score)
print(len(results))
Guys, please help me how to use the class whoosh.query.Phrase(fieldname, words, slop=1, boost=1.0, char_ranges=None)' to use the proximity search and varies the distance between the words. Thanks in Advance
What you want is a slop factor of 5.
A few points:
When you search, you must pass the query (q)
, not the query parser (qp)
: results = s.search(q, limit=5)
limit
refers to the maximum number of documents to return, not the slop factor. Your limit=5
parameter is saying you want to get up to 5 search results back (in case you were thinking this is the slop).
You can remove termclass=Phrase
You can construct a phrase query two ways:
~
and the slop factor to the phrase for proximity search. If you want phrase terms to be up to 5 words apart: "hello world"~5
SpanNear2
query. Allows you to programmatically structure it the way you want. Pass all your phrase terms as an array of Term
objects and specify slop
as a constructor parameter.from whoosh.query import spans
with ix.searcher() as s:
# Option 1: Query string
query = '"Hello World"~5'
qp = QueryParser("content", schema=ix.schema)
q = qp.parse(query)
results = s.search(q, limit=5)
# Option 2: SpanNear2
q = spans.SpanNear2([Term("content", "Hello"), Term("content", "world")], slop=5)
results = s.search(q, limit=5)