We're running Solr 3.4 and have a relatively small index of 90,000 documents or so. These documents are split over several logical sources, and so each search will have an applied filter query for a particular source, e.g:
?q=<query>&fq=source:<source>
where source
is a classic string field. We're using edismax and have a default search field text.
We are currently seeing q=*
taking on average 20 times longer to run than q=*:*
. The difference is quite noticeable, with *:*
taking 100ms and *
taking up to 3500ms. A search for a common word in the document set (matching nearly 50% of all documents) will return a result in less than 200ms.
Looking at the queries with debugQuery on, we can see that *
is parsed to a DisjunctionMaxQuery((text:*))
, while *:*
is parsed to a MatchAllDocsQuery(*:*)
. This makes sense, but I still don't feel like it accounts for a slowdown of this magnitude (a slowdown of 2000% over something that matches 50% of the documents).
What could be causing this? Is there anything we can tweak?
When you are passing just *
you are ordering to check every value in the field and match it against *
and that is a lot to do. However when you are using * : *
you are asking Solr to give you everything and skip any matching.
Solr/Lucene is optimized to do * : *
fast and efficient!