I'm using solr 6.6.0 ,and here are the documents in the collection.
{"id":1,"content":test1"}
{"id":2,"content":test2"}
{"id":3,"content":test3"}
Say I wanto to include the documents not containing "test1" and "test2",It seems legal to write the query string in the following way,according to the Grouping Terms to Form Sub-Queries section of refernce guide.
content:((NOT "test1") AND (NOT "test2"))
the result of the query is to expected return only document #3,but the actual result is empty.
Alternatively,if the above query is changed to the following,without parentheses surround the "NOT expressions",the expected result is returned.
content:(NOT "test1" AND NOT "test2")
My question is,why the first query string does not work in the expected way?
Solr currently checks for a "pure negative" query and inserts *:*
(which matches all documents) so that the latter format(that without parentheses) works correctly.
See the code snippet below from org.apache.solr.search.QueryUtils.java
/** Fixes a negative query by adding a MatchAllDocs query clause.
* The query passed in *must* be a negative query.
*/
public static Query fixNegativeQuery(Query q) {
BooleanQuery newBq = (BooleanQuery)q.clone();
newBq.add(new MatchAllDocsQuery(), BooleanClause.Occur.MUST);
return newBq;
}
So NOT "test"
is transformed by solr into (*:* NOT "test")
But Solr only checks only the top level query,so this means that a query like (NOT "test1")
is not changed since the pure negative query is not in the top level.
This is why the former format (that with parentheses) does not work as expected.
So,we can conclude generally that the proper way of using NOT
operator is the (*:* NOT some_expression)
form ,instead of a single NOT some_expression
.