mongodbaggregationstudio3t

Aggregation for counting the occurrence of sub-string in main string in mongodb


I am new to MongoDB and might be its a noob question.

I want to count the number of times "lupoK" repeated in the message field which is - "message" : "first lupoK lupoK" using aggregation in MongoDB, I am using studio3t interface.

My document structure is -

{ 
    "_id" : ObjectId("5df9c780b05196da93be262b"), 
    "id" : "61a4c53a-aa99-4336-ab4f-07bb7f618889", 
    "time" : "00:00:45", 
    "username" : "siul", 
    "message" : "***first lupoK lupoK***", 
    "emoticon_place" : [
        {
            "_id" : "128428", 
            "begin" : NumberInt(6), 
            "end" : NumberInt(10)
        }
    ], 
    "fragments" : [
        {
            "text" : "first "
        }, 
        {
            "emoticon" : {
                "emoticon_id" : "128428", 
                "emoticon_set_id" : ""
            }, 
            "text" : "***lupoK***"
        },
        {
            "emoticon" : {
                "emoticon_id" : "128428", 
                "emoticon_set_id" : ""
            }, 
            "text" : "***lupoK***"
        }
    ]
}

Thanks in advance!!!


Solution

  • This works in mongo shell (assuming the message field is a string and exists):

    db.test.aggregate( [
      { 
          $project: { 
              _id: 0, 
              message: 1, 
              count: { 
                  $subtract: [ 
                      { $size: { $split: [ "$message", "lupoK" ] } }, 1 
                  ] 
              } 
          } 
      }
    ] )
    


    NOTES:

    The $split operation splits the message string based on a delimiter - in this case the delimiter is "lupoK". The split returns an array of tokens which are separated by "lupoK". So, the number of tokens minus 1, gives the number of times "lupoK" is used, the count of occurrence of "lupoK".

    Check the result with these sample message strings:

    "***first lupoK lupoK***"
    "lupoKlupoK"
    " lupoK lupoK "
    ""
    "lupoKlupoKlupoK"
    "lupoK"
    "HELLO * lupoK* WORLD"
    "HELLO WORLD"
    "***first lupoK lupoKlupoK lupoK***lupoK *** last lupoK."
    

    For example, the tokens for some strings: