arrayscouchbaseflattensql++unnest

Return specific item from array on condition in flat structure output


I want to output a flat data table (for a csv) where I use the field predictionModels.modelID as the header and the corresponding value of predictionModels.predictions.passengers when predictionModels.predictions.daysAhead = 1. The number of models is variable but limited, so I'm happy to hardcode them into the query but can't just UNNEST and output the result.

I'm reluctant to create an index as it's a big datastore that is live in production, and I'm the junior in the office literally having a play to get to know our data better, and I'm worried about doing anything persistent or that might affect resource availability elsewhere. I've been trying something like this-

SELECT departureDateTime, lineID, weight, 
CASE WHEN predictionModels.modelID = "model1" AND predictionModels.predictions.daysAhead=1 THEN predictionModels.predictions.passengers END AS Model1,
CASE WHEN predictionModels.modelID = "model2" AND predictionModels.predictions.daysAhead=1 THEN predictionModels.predictions.passengers END AS Model2
FROM bucket

Sample Data:

    'departureDateTime': '2022-12-23T00:10:00+00:00',
    'lineID': 'f2b4d1d9',
    'weight': '630'
    'predictionModels': [
        {
        "modelID":"model1",
        "predictions": [
            {"daysAhead":1, "passengers":11},
            {"daysAhead":2, "passengers":12},
            {"daysAhead":3, "passengers":13},
            {"daysAhead":4, "passengers":14}
                ]
        },
        {
     "modelID":"model2",
     "predictions": [
            {"daysAhead":1, "passengers":21},
            {"daysAhead":2, "passengers":22},
            {"daysAhead":3, "passengers":23},
            {"daysAhead":4, "passengers":24}
                ]
        }
    ]
},
{
    'departureDateTime': '2023-01-24T00:17:00+00:00',
    'lineID': 'f2b4d1d9',
    'weight': '520'
    'predictionModels': [
    {
        "modelID":"model2",
        "predictions": [
            {"daysAhead":1, "passengers":210},
            {"daysAhead":2, "passengers":220},
            {"daysAhead":3, "passengers":230},
            {"daysAhead":4, "passengers":240}
        ]
    }
    ]
}

Desired output:

        departureDateTime |   lineID | weight | model1 | model2
---------------------------------------------------------------
2022-12-23T00:10:29+00:00 | f2b4d1d9 |   630  |     11 |     21
2023-01-24T00:17:29+00:00 | f2b4d1d9 |   520  |    Nan |    210

Solution

  • Covert ARRAY into flatten OBJECT (each field of the OBJECT becomes column in CVS).

    SELECT b.departureDateTime, b.lineID, b.weight,
           OBJECT pm.modelID:(FIRST p.passengers FOR p IN pm.predictions WHEN p.daysAhead = 1 END)
                  FOR pm IN b.predictionModels
                  END.*
    FROM bucket AS b;