sqlpostgresqlwindow-functionsuuid

How to assign unique UUIDs to groups of metrics in a PostgreSQL table with repeated names?


I’m working with a PostgreSQL table that stores metric data for different assets. The table currently has over 1 billion records.

Each update will have multiple metrics, e.g., speed, distance, elevation, heading, temp. Each metric is stored as an individual row with asset_id, timestamp, metric_name, and metric_value.

I want to go back and assign a unique UUID (update_id) to each group of metrics recorded at the same asset_id and timestamp. However, if a metric_name is repeated within the same asset_id and timestamp, I need it to start a new update_id.
db<>fiddle

create table example_input(
  asset_id uuid
 ,timestamp timestamp(0)
 ,metric_name text
 ,metric_value int);

insert into example_input values
 ('07536cba-6e00-4b8a-be02-9509228f4bb0','2024-12-16 17:43:14','distance',100)
,('07536cba-6e00-4b8a-be02-9509228f4bb0','2024-12-16 17:43:14','speed',60)
,('07536cba-6e00-4b8a-be02-9509228f4bb0','2024-12-16 17:43:14','elevation',15)
,('07536cba-6e00-4b8a-be02-9509228f4bb0','2024-12-16 17:43:14','distance',120)
,('07536cba-6e00-4b8a-be02-9509228f4bb0','2024-12-16 17:43:14','speed',80)
,('07536cba-6e00-4b8a-be02-9509228f4bb0','2024-12-16 17:43:19','distance',140)
,('07536cba-6e00-4b8a-be02-9509228f4bb0','2024-12-16 17:43:19','elevation',20);

Expected Output:

asset_id timestamp metric_name metric_value update_id
07536cba-6e00-4b8a-be02-9509228f4bb0 2024-12-16 17:43:14 distance 100 6e0c8f3a-9c92-4a99-b0ff-7e6ac1cd6bf2
07536cba-6e00-4b8a-be02-9509228f4bb0 2024-12-16 17:43:14 speed 60 6e0c8f3a-9c92-4a99-b0ff-7e6ac1cd6bf2
07536cba-6e00-4b8a-be02-9509228f4bb0 2024-12-16 17:43:14 elevation 15 6e0c8f3a-9c92-4a99-b0ff-7e6ac1cd6bf2
07536cba-6e00-4b8a-be02-9509228f4bb0 2024-12-16 17:43:14 distance 120 f9dcfa8e-95c4-498d-8a59-30233d1786b3
07536cba-6e00-4b8a-be02-9509228f4bb0 2024-12-16 17:43:14 speed 80 f9dcfa8e-95c4-498d-8a59-30233d1786b3
07536cba-6e00-4b8a-be02-9509228f4bb0 2024-12-16 17:43:19 distance 140 78ec74e7-a5dc-4673-99e3-ef26b76b2d42
07536cba-6e00-4b8a-be02-9509228f4bb0 2024-12-16 17:43:19 elevation 20 78ec74e7-a5dc-4673-99e3-ef26b76b2d42

Due to hardware limitations, the devices only have 1 second resolution, and there are (rare) instances when the device will send more than one update in the same second. In this case, I want to detect a repeated asset_id, timestamp and metric_name and have it generate a new uuid for the second update with the same second. This is the part that's giving the most grief that I can't figure out.


Solution

  • The uuid_generate_v3() function from the "uuid-ossp" extension lets you generate uuids in a repeatable manner. Use asset_id as the namespace and timestamp as the seed.

    create extension if not exists "uuid-ossp";
    

    To differentiate between multiple entries for the same metric_name on the same timestamp, you can add the row_number() window function to the seed.

    As a select: demo at db<>fiddle

    select*,uuid_generate_v3( asset_id
                             ,concat(timestamp,row_number()over w))as update_id
    from example_input
    window w as(partition by asset_id,timestamp,metric_name
                order by ctid)
    

    As an actual update:

    alter table example_input add column update_id uuid;
    
    update example_input as a 
    set update_id=uuid_generate_v3(asset_id,concat(timestamp,( 
            select row_number()over(order by ctid)
             from example_input as c
             where a.asset_id=c.asset_id
               and a.timestamp=c.timestamp
               and a.metric_name=c.metric_name
             order by c.ctid<>a.ctid limit 1)))--only get the current row's rownum
    
    asset_id timestamp metric_name metric_value update_id
    07536cba-6e00-4b8a-be02-9509228f4bb0 2024-12-16 17:43:14 distance 100 7e88679b-5038-3124-b0e6-a4ed16316ba4
    07536cba-6e00-4b8a-be02-9509228f4bb0 2024-12-16 17:43:14 speed 60 7e88679b-5038-3124-b0e6-a4ed16316ba4
    07536cba-6e00-4b8a-be02-9509228f4bb0 2024-12-16 17:43:14 elevation 15 7e88679b-5038-3124-b0e6-a4ed16316ba4
    07536cba-6e00-4b8a-be02-9509228f4bb0 2024-12-16 17:43:14 distance 120 d1ca0905-7fa1-3f8c-9284-2d615bdaacd3
    07536cba-6e00-4b8a-be02-9509228f4bb0 2024-12-16 17:43:14 speed 80 d1ca0905-7fa1-3f8c-9284-2d615bdaacd3
    07536cba-6e00-4b8a-be02-9509228f4bb0 2024-12-16 17:43:19 distance 140 40e25bd6-55eb-34cd-a8c5-262cd9d8987b
    07536cba-6e00-4b8a-be02-9509228f4bb0 2024-12-16 17:43:19 elevation 20 40e25bd6-55eb-34cd-a8c5-262cd9d8987b

    The ctid is a hidden system column holding each row's page and tuple number. You can still break the ties without it (remove order by ctid in both cases), but a lower ctid typically indicates an earlier entry which could make some sense in your case. The resulting uuid obfuscates that order (earlier entry won't translate to a "lower identifier") but it pairs up Nth metrics in each second: third speed for a given second ends up under the same update_id as the third elevation.