sqlpivotentity-attribute-valueamazon-timestream

How to group data with SQL


How do I group my Timestream data?

The table looks simplified like this:

point_delivery_number | measure_name  | time                   | value
------------------------------------------------------------------------
AT3265345345          | "consumption" | 2021-01-02 12:00:00.00 | 0.13
AT3265345345          | "generation"  | 2021-01-02 12:00:00.00 | 0.32

I want to query where point_delivery_number == xx and time = xx

Result should be:

point_delivery_number | consumption | time  | generation
----------------------------------------------------------
AT3265345345          |  0.13       | xxxxx | 0.32

What I tried is:

    SELECT point_delivery_number, measure_name, time, measure_value::double 
    FROM "energy_datapoints"."energy_data"
    WHERE point_delivery_number='AT234123234541243'
    GROUP BY point_delivery_number, measure_name, time, measure_value::double;

Result is:

point_delivery_number | measure_name  | time                   | value
------------------------------------------------------------------------
AT3265345345          | "generation" | 2021-01-02 12:15:00.00  | 0.123
AT3265345345          | "generation"  | 2021-01-02 12:00:00.00 | 0.32

I want consumption and generation to be a property rather than a value.


Solution

  • You are dealing with a key/value table. Per point_delivery_number it has rows with a key (measure_name) and values (time and value).

    You want to get the values for two keys. One way is to select both and join them:

    select
      point_delivery_number,
      c.value as consumption,
      g.value as generation
    from 
      (select * from energy_datapoints.energy_data where measure_name = 'consumption') c
    full outer join
      (select * from energy_datapoints.energy_data where measure_name = 'generation') g
        using (point_delivery_number)
    order by point_delivery_number;
    

    Another way is aggregation. You want one row per point_delivery_number, so you GROUP BY point_delivery_number. Then use MIN or MAX on a condition to only get the measure names in question.

    select
      point_delivery_number,
      min(case when measure_name = 'consumption' then value end) as consumption,
      min(case when measure_name = 'generation' then value end) as generation
    from energy_datapoints.energy_data
    group by point_delivery_number
    order by point_delivery_number;
    

    Disclaimer: I don't know Amazon Timestream. The above queries are standard SQL queries that should work (exactly as written or with slight changes) in most RDBMS.

    As to your own query: You make it look like you are aggregating, but it seems you are simply selecting single rows, because your GROUP BY clause includes all columns. GROUP BY ____ means "I want to aggregate my data to get one result row per ____". You want one result row per point_delivery_number, so GROUP BY point_delivery_number.