I have a MongoDB database where are storage different files, those files can be png, jpg, pdf, here an example:
{
"_id" : "id of the document",
"metadata" : {
"s" : "documents",
"u" : "id of the owner",
"d" : ""
},
"filename" : "El-jugador-Fedor-Dostoyevski.pdf",
"contentType" : "application/pdf",
}
I deleted some fields not relevant, the type of the document is given by the contentType
field, what I want is to get the count of each type of file using Scala and ReactiveMongo
I already did it but doing three consultations, in this way:
def contentTypeStats(implicit ec: ExecutionContext): Future[ContentTypesDTO] = {
collectionFactory.collection().flatMap(collection => {
val filterDocuments = BSONDocument("metadata.s" -> BSONDocument("$ne" -> "thumbnail"))//don't want count thumbnails, only documents
val filterPNG = BSONDocument(filterDocuments, "contentType" -> "image/png")
val filterJPG = BSONDocument(filterDocuments, "contentType" -> "image/jpeg")
val filterPDF = BSONDocument(filterDocuments, "contentType" -> "application/pdf")
val countPNGFuture: Future[Long] = collection.count(Some(filterPNG))
val countJPGFuture: Future[Long] = collection.count(Some(filterJPG))
val countPDFFuture: Future[Long] = collection.count(Some(filterPDF))
for {
countPNG <- countPNGFuture
countJPG <- countJPGFuture
countPDF <- countPDFFuture
} yield {
ContentTypesDTO(
pngCount = countPNG,
jpgCount = countJPG,
pdfCount = countPDF
)
}
})
}
I would like to do this with only one consult, in MongoDB I do it this way:
db.getCollection('myCollection').aggregate([
{$match: {'metadata.s': {$ne: 'thumbnail'}}},
{$group: {_id: "$contentType", count: {$sum: 1}} }
])
That return me this:
/* 1 */
{
"_id" : "image/png",
"count" : 5.0
}
/* 2 */
{
"_id" : "application/pdf",
"count" : 9.0
}
/* 3 */
{
"_id" : "image/jpeg",
"count" : 8.0
}
I try this way:
def contentTypeStats(implicit ec: ExecutionContext): Future[ContentTypesDTO] = {
collectionFactory.collection().flatMap(collection => {
import collection.AggregationFramework._
val result: Future[Option[BSONDocument]] = collection.aggregatorContext[BSONDocument](
pipeline = List(
Match(BSONDocument("metadata.s" -> BSONDocument("$ne" -> "thumbnail"))),
Group(BSONDocument("_id" -> "$contentType"))("count" -> SumAll)
)
).prepared.cursor.headOption
result.map {
case Some(doc) =>
println(doc.getAsOpt[String]("_id"))//here always return None
ContentTypesDTO(
pngCount = doc.getAsOpt[Long]("count").getOrElse(0L),
jpgCount = doc.getAsOpt[Long]("count").getOrElse(0L),
pdfCount = doc.getAsOpt[Long]("count").getOrElse(0L)
)//all have the same number
}
})
}
That method return None
when ask for the _id
and the count field give randomly some of the previous results (5, 8, 9), it should be a way to access the specific count field of each _id
that should be either image/png
or image/jpeg
or application/pdf
but how if I can get the _id
There were a couple of problems in my solution
val result: Future[Option[BSONDocument]] = collection.aggregatorContext[BSONDocument](
pipeline = List(
Match(BSONDocument("metadata.s" -> BSONDocument("$ne" -> "thumbnail"))),
Group(BSONDocument("_id" -> "$contentType"))("count" -> SumAll)
)
).prepared.cursor.headOption
here I was mapping the _id
as a BSONDocument
, that's why a could not get that _id
, so the solution was to mapped as a BSONString
in this way
Group(BSONString("$contentType"))("count" -> SumAll)
The Group
method always create the _id
field with the first parameter.
The second problem was returning the result, here .prepared.cursor.headOption
only return me the first BSONDocument from the created group. To fixed using the Cursor class from reactivemongo
.prepared.cursor.collect[List](-1, Cursor.FailOnError[List[BSONDocument]]())
that return me a List of BSONDocument
After that also change the ContentTypesDTO
to just be
case class ContentTypesDTO(contentType: String,
count: Long)
And using map on the result
to get a Seq[ContentTypesDTO]
Here the final solution:
def contentTypeStats(implicit ec: ExecutionContext): Future[Seq[ContentTypeDetailsDTO]] = {
collectionFactory.collection().flatMap(collection => {
import collection.AggregationFramework.{Group, Match, SumAll}
val result: Future[List[BSONDocument]] = collection.aggregatorContext[BSONDocument](
List(
Match(BSONDocument("metadata.s" -> BSONDocument("$ne" -> "thumbnail"))),
Group(BSONString("$contentType"))("count" -> SumAll))
).prepared.cursor.collect[List](-1, Cursor.FailOnError[List[BSONDocument]]())
result.map(docList => {
docList.map {
doc =>
ContentTypeDetailsDTO(
contentType = doc.getAsOpt[String]("_id").getOrElse(""),
count = doc.getAsOpt[Long]("count").getOrElse(0L)
)
}
})
})
}
That method returns this:
[
{
"contentType": "application/pdf",
"count": 9
},
{
"contentType": "image/svg",
"count": 1
},
{
"contentType": "image/png",
"count": 4
},
{
"contentType": "image/jpeg",
"count": 8
}
]