azure-cosmosdbazure-cosmosdb-sqlapi

Azure CosmosDB NoSQL GroupBy with Aggregate over multiple columns


EDIT : To focus on real question updated the numeric values (not strings) and fixed a copy-paste error in query.

The PROBLEM :

I have documents in cosmosdb which store weather sensor measurements like this :

        "Body": {
            "Type": "Telemetry",
            "Temp": 4.43,
            "Hum": 77.65,
            "Baro": 935.77,
            "BaroSLP": 1010.90,
            "Alt": 666.05,
            "SBattery": 3751.00,
            "GBattery": 4466,
            "RSSI": -81,
            "Version": 8,
            "TimeStamp": "2025-03-15T16:11:35Z"
        }

I want to query documents with these values averaged out hourly.

I crafted this query what I would normally do with SQL :

SELECT AVG(i.Body.Temp) as Temp, AVG(i.Body.Hum) as Hum, AVG(i.Body.Baro) as Baro, AVG(i.Body.BaroSLP) as BaroSLP, AVG(i.Body.Alt) as Alt, AVG(i.Body.SBattery) as Battery, Max(i.Body.TimeStamp) as Time
FROM iothub i
WHERE i.partitionKey = @key and i.Body.TimeStamp >= @fromTime and i.Body.TimeStamp <= @toTime
group by datetimepart("hh", i.Body.TimeStamp)

This does not work. I learnt that for AVG I need VALUE similar to this:

SELECT VALUE AVG(i.Body.Temp)
FROM iothub i
WHERE i.partitionKey = @key and i.Body.TimeStamp >= @fromTime and i.Body.TimeStamp <= @toTime
group by datetimepart("hh", i.Body.TimeStamp)
...

The QUESTION :

Even if above works, I am unable to find a syntactically correct way to use aggregates over multiple columns, like below (bad) example :

SELECT VALUE AVG(i.Body.Temp) as Temp, VALUE AVG(i.Body.Hum) as Hum, ...
...
...

So is this achivable somehow with Azure CosmosDB NoSQL langauge ?


Solution

  • Thanks @Martin for your insights. Yes, @Laszlo I tried in my environment and it works successfully when the data is converted from string to number by using StringToNumber function in the query. I tried by using below query, it uses stringToNUmber for converting string to number. Also, uses AVG() to calculate the average of each numeric field for that particular hour.

    SELECT 
        datetimepart("hh", i.Body.TimeStamp) AS hh,
        AVG(StringToNumber(i.Body.Temp)) AS Temp,
        AVG(StringToNumber(i.Body.Hum)) AS Hum,
        AVG(StringToNumber(i.Body.Baro)) AS Baro,
        AVG(StringToNumber(i.Body.BaroSLP)) AS BaroSLP,
        AVG(StringToNumber(i.Body.Alt)) AS Alt,
        AVG(StringToNumber(i.Body.SBattery)) AS SBattery,
        MAX(i.Body.TimeStamp) AS TimeStamp,
        COUNT(1) AS Count
    FROM iothub i
    GROUP BY datetimepart("hh", i.Body.TimeStamp)
    

    Output:

    [
        {
            "hh": 17,
            "Temp": 6.1,
            "Hum": 75.65,
            "Baro": 937.15,
            "BaroSLP": 1011.1500000000001,
            "Alt": 668.75,
            "SBattery": 3747.75,
            "TimeStamp": "2025-03-15T17:50:30Z",
            "Count": 2
        },
        {
            "hh": 16,
            "Temp": 5.016666666666667,
            "Hum": 76.91666666666667,
            "Baro": 936.1233333333333,
            "BaroSLP": 1010.9499999999999,
            "Alt": 666.9833333333333,
            "SBattery": 3750.1666666666665,
            "TimeStamp": "2025-03-15T16:55:20Z",
            "Count": 3
        }
    ]