mongodbscalamongodb-queryreactivemongo

Reactive mongo group elements and count them


I have a MongoDB database where are storage different files, those files can be png, jpg, pdf, here an example:

{
    "_id" : "id of the document",
    "metadata" : {
        "s" : "documents",
        "u" : "id of the owner",
        "d" : ""
    },
    "filename" : "El-jugador-Fedor-Dostoyevski.pdf",
    "contentType" : "application/pdf",
}

I deleted some fields not relevant, the type of the document is given by the contentType field, what I want is to get the count of each type of file using Scala and ReactiveMongo

I already did it but doing three consultations, in this way:

def contentTypeStats(implicit ec: ExecutionContext): Future[ContentTypesDTO] = {
    collectionFactory.collection().flatMap(collection => {

      val filterDocuments = BSONDocument("metadata.s" -> BSONDocument("$ne" -> "thumbnail"))//don't want count thumbnails, only documents
      
      val filterPNG = BSONDocument(filterDocuments, "contentType" -> "image/png")
      val filterJPG = BSONDocument(filterDocuments, "contentType" -> "image/jpeg")
      val filterPDF = BSONDocument(filterDocuments, "contentType" -> "application/pdf")

      val countPNGFuture: Future[Long] = collection.count(Some(filterPNG))
      val countJPGFuture: Future[Long] = collection.count(Some(filterJPG))
      val countPDFFuture: Future[Long] = collection.count(Some(filterPDF))

      for {
        countPNG <- countPNGFuture
        countJPG <- countJPGFuture
        countPDF <- countPDFFuture
      } yield {
        ContentTypesDTO(
          pngCount = countPNG,
          jpgCount = countJPG,
          pdfCount = countPDF
        )
      }

    })
  }

I would like to do this with only one consult, in MongoDB I do it this way:

db.getCollection('myCollection').aggregate([
    {$match: {'metadata.s': {$ne: 'thumbnail'}}},
    {$group: {_id: "$contentType", count: {$sum: 1}} }
])

That return me this:

/* 1 */
{
    "_id" : "image/png",
    "count" : 5.0
}

/* 2 */
{
    "_id" : "application/pdf",
    "count" : 9.0
}

/* 3 */
{
    "_id" : "image/jpeg",
    "count" : 8.0
}

I try this way:

def contentTypeStats(implicit ec: ExecutionContext): Future[ContentTypesDTO] = {
    collectionFactory.collection().flatMap(collection => {

      import collection.AggregationFramework._
      val result: Future[Option[BSONDocument]] = collection.aggregatorContext[BSONDocument](
        pipeline = List(
          Match(BSONDocument("metadata.s" -> BSONDocument("$ne" -> "thumbnail"))),
          Group(BSONDocument("_id" -> "$contentType"))("count" -> SumAll)
        )
      ).prepared.cursor.headOption

      result.map {
        case Some(doc) =>
          println(doc.getAsOpt[String]("_id"))//here always return None
          ContentTypesDTO(
            pngCount = doc.getAsOpt[Long]("count").getOrElse(0L),
            jpgCount = doc.getAsOpt[Long]("count").getOrElse(0L),
            pdfCount = doc.getAsOpt[Long]("count").getOrElse(0L)
          )//all have the same number
      }
    })
  }

That method return None when ask for the _id and the count field give randomly some of the previous results (5, 8, 9), it should be a way to access the specific count field of each _id that should be either image/png or image/jpeg or application/pdf but how if I can get the _id


Solution

  • There were a couple of problems in my solution

    val result: Future[Option[BSONDocument]] = collection.aggregatorContext[BSONDocument](
            pipeline = List(
              Match(BSONDocument("metadata.s" -> BSONDocument("$ne" -> "thumbnail"))),
              Group(BSONDocument("_id" -> "$contentType"))("count" -> SumAll)
            )
          ).prepared.cursor.headOption
    

    here I was mapping the _id as a BSONDocument, that's why a could not get that _id, so the solution was to mapped as a BSONString in this way

    Group(BSONString("$contentType"))("count" -> SumAll)
    

    The Group method always create the _id field with the first parameter. The second problem was returning the result, here .prepared.cursor.headOption only return me the first BSONDocument from the created group. To fixed using the Cursor class from reactivemongo .prepared.cursor.collect[List](-1, Cursor.FailOnError[List[BSONDocument]]()) that return me a List of BSONDocument After that also change the ContentTypesDTO to just be

    case class ContentTypesDTO(contentType: String,
                               count: Long)
    

    And using map on the result to get a Seq[ContentTypesDTO] Here the final solution:

    def contentTypeStats(implicit ec: ExecutionContext): Future[Seq[ContentTypeDetailsDTO]] = {
        collectionFactory.collection().flatMap(collection => {
    
          import collection.AggregationFramework.{Group, Match, SumAll}
    
          val result: Future[List[BSONDocument]] = collection.aggregatorContext[BSONDocument](
            List(
              Match(BSONDocument("metadata.s" -> BSONDocument("$ne" -> "thumbnail"))),
              Group(BSONString("$contentType"))("count" -> SumAll))
          ).prepared.cursor.collect[List](-1, Cursor.FailOnError[List[BSONDocument]]())
    
          result.map(docList => {
            docList.map {
              doc =>
                ContentTypeDetailsDTO(
                  contentType = doc.getAsOpt[String]("_id").getOrElse(""),
                  count = doc.getAsOpt[Long]("count").getOrElse(0L)
                )
            }
          })
        })
      }
    

    That method returns this:

    [
      {
        "contentType": "application/pdf",
        "count": 9
      },
      {
        "contentType": "image/svg",
        "count": 1
      },
      {
        "contentType": "image/png",
        "count": 4
      },
      {
        "contentType": "image/jpeg",
        "count": 8
      }
    ]