node.jsmongodbmongooseaggregation-frameworkaggregation

Fill missing dates in records


I have a collection of ProductViews:

{
    productId: "5b8c0f3204a10228b00a1745",
    createdAt: "2018-09-07T17:18:40.759Z"
}

And I have a query for fetching the daily views for a specific product:

ProductView.aggregate([
    { $match: { productId } },
    { $project: { day: { $substr: ["$createdAt", 0, 10] } } },
    {
        $group: {
            _id: "$day",
            count: { $sum: 1 },
            time: { $avg: "$createdAt" },
        }
    },
    { $sort: { _id: 1 } },
    {
        $project: {
            date: '$_id',
            views: '$count',
        },
    },
]).exec((err, result) => ...)

Which currently gives:

[
    { date: '2018-09-01', views: 1 },
    { date: '2018-09-02', views: 3 },
    { date: '2018-09-04', views: 2 },
    { date: '2018-09-05', views: 5 },
    // ...
]

Issue:

The issue is, that this aggregation does not return { date: '2018-09-03', views: 0 } for days with 0 views. This results in incorrect displaying of the data:

Graph of incorrectly displayed data

Results should look like:

[
    { date: '2018-09-01', views: 1 },
    { date: '2018-09-02', views: 3 },
    { date: '2018-09-03', views: 0 }, // <=
    { date: '2018-09-04', views: 2 },
    { date: '2018-09-05', views: 5 },
    // ...
]

P.S.: It would be perfect to pass in the start and end dates to output results based on this range


Solution

  • You need few additional stages to return default values. First of all you need to use $group with _id set to null to collect all results in one document. Then you can use $map with an array of days as an input. Inside that $map you can use $indexOfArray to find if that date exists in your current result set. If yes (index != -1) then you can return that value, otherwise you need to return default subdocument with views set to 0. Then you can use $unwind to get back a list of documents and $replaceRoot to promote nested stats to a top level.

    ProductView.aggregate([
        { $match: { productId: '5b8c0f3204a10228b00a1745' } },
        { $project: { day: { $substr: ["$createdAt", 0, 10] } } },
        {
            $group: {
                _id: "$day",
                count: { $sum: 1 },
                time: { $avg: "$createdAt" },
            }
        },
        { $sort: { _id: 1 } },
        {
            $project: {
                date: '$_id',
                views: '$count',
            },
        },
        {
            $group: {
                _id: null,
                stats: { $push: "$$ROOT" }
            }
        },
        {
            $project: {
                stats: {
                    $map: {
                        input: [ "2018-09-01", "2018-09-02", "2018-09-03", "2018-09-04", "2018-09-05" ],
                        as: "date",
                        in: {
                            $let: {
                                vars: { dateIndex: { "$indexOfArray": [ "$stats._id", "$$date" ] } },
                                in: { 
                                    $cond: {
                                        if: { $ne: [ "$$dateIndex", -1 ] },
                                        then: { $arrayElemAt: [ "$stats", "$$dateIndex" ] },
                                        else: { _id: "$$date", date: "$$date", views: 0 }
                                    } 
                                }
                            }
                        }
                    }
                }
            }
        },
        {
            $unwind: "$stats"
        },
        {
            $replaceRoot: {
                newRoot: "$stats"
            }
        }
    ]).exec((err, result) => ...)
    

    You can generate a static list of dates in your application logic using simple loop. I believe that's possible in MongoDB as well (using $range) but it might complicate this aggregation pipeline. Let me know if you're fine with that or you want to try to generate that array of dates in MongoDB.