mongodbmongodb-querypipelinepymongo-3.xarrayobject

MongoDB how to group objects in array and find the top 3


I want to find which are the top 3 places (if it has) that a user comments most.

More specifically, I have a collection review which has the user_id and the business_id and a collection business which has business_id and longitude and latitude. If the user has made a review from the location (38.551126, -110.880452) and (38.999999, -110.000000), we can say the user has 2 reviews at this location (38, -110).

Collection.review:

{
  "review_id": "KU_O5udG6zpxOg-VcAEodg",
  "user_id": "mh_-eMZ6K5RLWhZyISBhwA",
  "business_id": "XQfwVwDr-v0ZS3_CbbE5Xw",
  "stars": 3,
  "useful": 0,
  "funny": 0,
  "cool": 0,
  "text": "If you decide to eat here, ...",
  "date": "2018-07-07 22:09:11"
}

collection.business:

{
  "business_id": "XQfwVwDr-v0ZS3_CbbE5Xw",
  "name": "Turning Point of North Wales",
  "address": "1460 Bethlehem Pike",
  "city": "North Wales",
  "state": "PA",
  "postal_code": "19454",
  "latitude": 40.2101961875,
  "longitude": -75.2236385919,
  "stars": 3,
  "review_count": 169,
  "is_open": 1,
  "attributes": Object
  "categories": "Restaurants, Breakfast & Brunch, Food",
  "hours": Object
}

I am a beginner at mongodb, the only pipeline I have some result is this.

review.aggregate([{
 $match: {
  user_id: {
   $in: [some_list]
  }
 }
}, {
 $lookup: {
  from: 'business',
  localField: 'business_id',
  foreignField: 'business_id',
  as: 'business'
 }
}, {
 $unwind: {
  path: '$business'
 }
}, {
 $group: {
  _id: '$user_id',
  coordinates: {
   $push: {
    latitude: {
     $toInt: '$business.latitude'
    },
    longitude: {
     $toInt: '$business.longitude'
    }
   }
  },
  places: {
   $sum: 1
  }
 }
}])

output

_id: "xoZvMJPDW6Q9pDAXI0e_Ww"
coordinates: Array
    0: Object
        latitude: 39
        longitude: -119
    1: Object
        latitude: 39
        longitude: -119
    2: Object
        latitude: 39
        longitude: -119
    3: Object
        latitude: 39
        longitude: -119
    4: Object
        latitude: 39
        longitude: -119
places: 5

After that i process the result in python. Is there someway to do it from the pipeline immediately and have the result something like this (and limited to top 3)

_id: "xoZvMJPDW6Q9pDAXI0e_Ww"
top_places: Array
    0: Object
        latitude: 39
        longitude: -119
    1: Object
        latitude: 23
        longitude: 56


  

Solution

  • You just need to add an additional grouping stage, first group by location and user and only then group by user, like so:

    db.review.aggregate([
      {
        $lookup: {
          from: "business",
          localField: "business_id",
          foreignField: "business_id",
          as: "business"
        }
      },
      {
        $unwind: {
          path: "$business"
        }
      },
      {
        $group: {
          _id: {
            user: "$user_id",
            latitude: {
              $toInt: "$business.latitude"
            },
            longitude: {
              $toInt: "$business.longitude"
            }
          },
          places: {
            $sum: 1
          }
        }
      },
      {
        $group: {
          _id: "$_id.user",
          coordinates: {
            $push: {
              latitude: "$_id.latitude",
              longitude: "$_id.longitude",
              
            }
          },
          places: {
            $sum: "$places"
          }
        }
      }
    ])
    

    Mongo Playground