Let's say I have tagged conversations, and conversations have messages which have created_at and read_at(=nil). I'm looking to have, for all conversations in a tag, their unread messages and the last message regardless if its unread. I have this so far, but the last message is duplicated if it's unread... I'm looking to remove the duplicated message.
match = { '$match': {
'tag_ids': {
'$in': [
BSON::ObjectId('658db0d00f5b4f4471bdb3b6'), # inbox
],
}
} }
unread_messages = { '$lookup': {
'from': 'wco_email_message',
'localField': '_id',
'foreignField': 'conversation_id',
'pipeline': [
{ '$match': { 'read_at': nil } },
{ '$sort': { 'created_at': 1 } },
{ '$project': { 'created_at': 1 } },
],
'as': 'unread_messages',
} }
last_message = { '$lookup': {
'from': 'wco_email_message',
'localField': '_id',
'foreignField': 'conversation_id',
'pipeline': [
{ '$sort': { 'created_at': -1 } },
{ '$limit': 1 },
{ '$project': { 'created_at': 1 } },
],
'as': 'last_message',
} }
project = { '$project': {
"subject": 1,
"all_messages": { '$concatArrays': [ '$unread_messages', '$last_message' ] },
} }
group = { '$group': {
'_id': {
"subject": 1,
"all_messages": { '$addToSet': '$all_messages' },
},
} }
outs = WcoEmail::Conversation.collection.aggregate([
match,
unread_messages,
last_message,
project,
# group,
]).to_a
Output:
[{"_id"=>BSON::ObjectId('65985e070f5b4f7476a2b1c8'),
"subject"=>"Re: JavaScript Developer _ Austin, TX.",
"all_messages"=>
[{"_id"=>BSON::ObjectId('65985ba10f5b4f52505dbe1a'), "created_at"=>2024-01-05 19:42:25.238 UTC},
{"_id"=>BSON::ObjectId('65985e070f5b4f7476a2b1c9'), "created_at"=>2024-01-05 19:52:39.506 UTC},
{"_id"=>BSON::ObjectId('65985eda0f5b4f760e1f4265'), "created_at"=>2024-01-05 19:56:10.674 UTC},
{"_id"=>BSON::ObjectId('65985eda0f5b4f760e1f4265'), "created_at"=>2024-01-05 19:56:10.674 UTC}]},
{"_id"=>BSON::ObjectId('659f36550f5b4f114ea648ac'),
"subject"=>"You've been chosen!",
"all_messages"=>
[{"_id"=>BSON::ObjectId('659f3bdc0f5b4f1a31928d7c'), "created_at"=>2024-01-11 00:52:44.109 UTC},
{"_id"=>BSON::ObjectId('659f3bdc0f5b4f1a31928d7c'), "created_at"=>2024-01-11 00:52:44.109 UTC}]},
{"_id"=>BSON::ObjectId('659f3c970f5b4f1b3ee43a78'),
"subject"=>"Re: Wasya Co Inquiry",
"all_messages"=>[{"_id"=>BSON::ObjectId('659f3c970f5b4f1b3ee43a79'), "created_at"=>2024-01-11 00:55:51.927 UTC}]},
{"_id"=>BSON::ObjectId('65bb350b0f5b4f1e126bb4ee'),
"subject"=>"a Deploy Script",
"all_messages"=>
[{"_id"=>BSON::ObjectId('65bb37ea0f5b4f347256f121'), "created_at"=>2024-02-01 06:19:22.991 UTC},
{"_id"=>BSON::ObjectId('65bb37ea0f5b4f347256f121'), "created_at"=>2024-02-01 06:19:22.991 UTC}]},
{"_id"=>BSON::ObjectId('65d524d8767ccd3c1cf87d2b'),
"subject"=>"Resend of AWS Message",
"all_messages"=>
[{"_id"=>BSON::ObjectId('65d524d8767ccd3c1cf87d2c'), "created_at"=>2024-02-20 22:16:56.068 UTC},
{"_id"=>BSON::ObjectId('65d5258a767ccd3c1cf87d2d'), "created_at"=>2024-02-20 22:19:54.454 UTC},
{"_id"=>BSON::ObjectId('65d5258a767ccd3c1cf87d2d'), "created_at"=>2024-02-20 22:19:54.454 UTC}]}]
you can see with timestamps, the last two messages in all conversations are duplicated, except one where there are no unread messages.
Instead of using $concatArrays
to put the arrays together, you could use $setUnion
that treats the arrays as sets and removes duplicate elements.
You can use this mongoplayground to test; I have replicated the relevant parts of your sample (without the $group
stage at the end, and with the extra field read_at
so that you can discern between read/unread).