There are several overloads of IndexSearcher.Search method in Lucene. Some of them require "top n hits" argument, some don't (these are obsolete and will be removed in Lucene.NET 3.0).
Those, which require "top n" argument actually cause memory preallocation for this entire posible range of results. So when you're in situation when you can't even approximately estimate count of results returned, the only opportunity is to pass a random large number to ensure that all query results will be returned. This causes severe memory pressure and leaks due to LOH fragmentation.
Is there an oficial not outdated way to search without passing "top n" argument?
Thanks in advance, guys.
I'm using Lucene.NET 2.9.2 as reference point for this answer.
You could build a custom collector which you pass to one of the search overloads.
using System;
using System.Collections.Generic;
using Lucene.Net.Index;
using Lucene.Net.Search;
public class AwesomeCollector : Collector {
private readonly List<Int32> _docIds = new List<Int32>();
private Scorer _scorer;
private Int32 _docBase;
public IEnumerable<Int32> DocumentIds {
get { return _docIds; }
}
public override void SetScorer(Scorer scorer) {
_scorer = scorer;
}
public override void Collect(Int32 doc) {
var score = _scorer.Score();
if (_lowerInclusiveScore <= score)
_docIds.Add(_docBase + doc);
}
public override void SetNextReader(IndexReader reader, Int32 docBase) {
_docBase = docBase;
}
public override bool AcceptsDocsOutOfOrder() {
return true;
}
}