A field in my data is stored as a string instead of a real nested array:
{
"uuid": "yxcvb",
"product": "[{\"pid\":\"4f76c06825aa486db9a7a5cedbc2ea19\",\"size\":\"S\"},{\"pid\":\"d3055e0a1bb040dba65674b9733bf4ce\",\"size\":\"M\"}]",
"timestamp":{"$date":"2023-01-01T11:25:47+00:00"}
},
{
"uuid": "asdfg",
"product": "[{\"pid\":\"369b33cc768d496193b18d657d706920\",\"size\":\"M\"}]",
"timestamp":{"$date":"2023-01-02T11:25:47+00:00"}
},
{
"uuid": "qwert",
"product": "[{\"pid\":\"13325cb71fd7413c8f3e18caf20b5d6d\",\"size\":\"S\"},{\"pid\":\"ac431397d49d449cab44d98dc13ec57c\",\"size\":\"M\"},{\"pid\":\"ba35b9f804a44e15bf197d3ef671dc34\",\"size\":\"L\"}]",
"timestamp":{"$date":"2023-01-03T11:25:47+00:00"}
}
/// "product" has a maximum of 4 dict-like pid-size units
I would like to know the count of each size
and group by timestamp
. Expected output:
[
{
"week": 52,
"sizeS": 1
},
{
"week": 52,
"sizeM": 1
},
{
"week": 1,
"sizeS": 1
},
{
"week": 1,
"sizeM": 2
},
{
"week": 1,
"sizeL": 1
}
]
To realize this, the following problems need to be solved:
$unwind
.pid
and size
values in a document, and save only the value after the colon (e.g. "4f76c06825aa486db9a7a5cedbc2ea19", "M").size
within the grouped timeframe (week).You can use $function
to parse the json strings and use it as normal mongodb fields.
db.collection.aggregate([
{
"$set": {
"product": {
"$function": {
"body": "function(product) {return JSON.parse(product)}",
"args": [
"$product"
],
"lang": "js"
}
}
}
},
{
"$unwind": "$product"
},
{
$group: {
_id: {
week: {
"$isoWeek": "$timestamp"
},
size: "$product.size"
},
count: {
$sum: 1
}
}
},
{
$sort: {
"_id.week": -1
}
}
])