elasticsearch filter lucene hibernate-search hibernate-search-6

Does the order in which BooleanPredicateClausesStep filters are chained matter?

I have the following method that creates a BooleanPredicateClausesStep to do a query with.

private BooleanPredicateClausesStep<?> getJournalAndSpatialSearchCriteria(GeoFilter geoFilter, SearchPredicateFactory factory, Boolean includeJournalsWithStatusFinished) {
    SearchPredicate journalLocationMustResideWithinRadius = getJournalsContainedWithinRadiusPredicate(geoFilter, factory);
    SearchPredicate mustOrShouldBeOfStatus = getSubmissionStatusConditionPredicate(includeJournalsWithStatusFinished, factory);
    return factory.bool()
        .filter( journalLocationMustResideWithinRadius )
        .filter( factory.match().field( "deleted" ).matching( "false" ) )
        .filter( mustOrShouldBeOfStatus )
        .filter( factory.match().field( "containsHarvestEntry" ).matching( "true" ) )
        .filter( factory.match().field( "grownOutdoors" ).matching( "true" ) );
}

It contains one spatial search predicate that checks whether journals fall within a predefined circular geographical area or not. All the other filters are simple ones that only check whether a certain field matches a value or not.

My question is: Do all these filters get implemented sequentially or all at once? Or to put it differently; would lucene first fetch all of the objects that fall within the defined geographical area before it checks whether they are deleted or does it check both simultaneously? The hibernate search documentation doesn't say anything about the order in which filters are processed.

Solution

In short: no, the order in which filters are declared within a single boolean predicate doesn't matter. The results will be the same regardless of order, and performance will most likely be the same regardless of order.

Detailed answer:

The order of clauses in a given boolean predicate doesn't matter in the sense that the predicate will match the same documents regardless of order.

Regarding implementation, it's a bit complex, but roughly speaking filters are turned into DocIdSetIterators which are then combined, so that the combined iterator only goes through documents returned by all iterators. That means each iterator will be incremented one after the other.

However, there are optimizations that allow iterators to "skip through" to the document matched by the previous iterator, so order might matter, but for performance only: if you have a filter that is quick and matches very few documents, it's better to execute it first.

But... Lucene often has knowledge of the "cost" of each filter/iterator, and will automatically change the order of iterators to execute the less costly ones first (see org.apache.lucene.search.ConjunctionDISI#createConjunction in Lucene 8.11).

All that to say: even for performance, while order matters internally, it shouldn't matter to you. So, don't even think about it :)