I am using mongodb and noticed that we should use not count()
function over a filter query by getting this warning:
@deprecated Use countDocuments or estimatedDocumentCount
http://mongodb.github.io/node-mongodb-native/3.1/api/Collection.html#count
The suggested alternative countDocuments()
uses aggregation which in certain cases proves to be 20 or more times slower for us (from 0.5 second on 10k filtered out documents vs 20 seconds which makes response time unbearable). It is claimed that the count()
function might be inaccurate although I have never witnessed this myself. In what cases can it be inaccurate? Is there an article explaining why it got deprecated in the first place? What happens behind the scenes of count which is worth deprecation over alternative that uses aggregation (countDocuments
)?
For reference, countDocuments implementation looks like this:
class CountDocumentsOperation extends AggregateOperation {
constructor(collection, query, options) {
const pipeline = [{ $match: query }];
if (typeof options.skip === 'number') {
pipeline.push({ $skip: options.skip });
}
if (typeof options.limit === 'number') {
pipeline.push({ $limit: options.limit });
}
pipeline.push({ $group: { _id: 1, n: { $sum: 1 } } });
super(collection, pipeline, options);
}
If you have a sharded cluster and chunks are being moved from one shard to another, they can be counted twice (leading to the count being potentially 2x the real value).
The deprecation happened in conjunction with sharded transaction work for MongoDB 4.2, since count couldn't be guaranteed to return the correct result in a transaction. The options were to make count actually count documents (which is slow) or to prohibit it in transactions. The latter option was chosen, leading to countDocuments and estimatedDocumentCount pair.