How to assign unique UUIDs to groups of metrics in a PostgreSQL table with repeated names?

I’m working with a PostgreSQL table that stores metric data for different assets. The table currently has over 1 billion records.

Each update will have multiple metrics, e.g., speed, distance, elevation, heading, temp. Each metric is stored as an individual row with asset_id, timestamp, metric_name, and metric_value.

I want to go back and assign a unique UUID (update_id) to each group of metrics recorded at the same asset_id and timestamp. However, if a metric_name is repeated within the same asset_id and timestamp, I need it to start a new update_id.
_db<>fiddle

create table example_input(
  asset_id uuid
 ,timestamp timestamp(0)
 ,metric_name text
 ,metric_value int);

insert into example_input values
 ('07536cba-6e00-4b8a-be02-9509228f4bb0','2024-12-16 17:43:14','distance',100)
,('07536cba-6e00-4b8a-be02-9509228f4bb0','2024-12-16 17:43:14','speed',60)
,('07536cba-6e00-4b8a-be02-9509228f4bb0','2024-12-16 17:43:14','elevation',15)
,('07536cba-6e00-4b8a-be02-9509228f4bb0','2024-12-16 17:43:14','distance',120)
,('07536cba-6e00-4b8a-be02-9509228f4bb0','2024-12-16 17:43:14','speed',80)
,('07536cba-6e00-4b8a-be02-9509228f4bb0','2024-12-16 17:43:19','distance',140)
,('07536cba-6e00-4b8a-be02-9509228f4bb0','2024-12-16 17:43:19','elevation',20);

Expected Output:

asset_id	timestamp	metric_name	metric_value	update_id
07536cba-6e00-4b8a-be02-9509228f4bb0	2024-12-16 17:43:14	distance	100	6e0c8f3a-9c92-4a99-b0ff-7e6ac1cd6bf2
07536cba-6e00-4b8a-be02-9509228f4bb0	2024-12-16 17:43:14	speed	60	6e0c8f3a-9c92-4a99-b0ff-7e6ac1cd6bf2
07536cba-6e00-4b8a-be02-9509228f4bb0	2024-12-16 17:43:14	elevation	15	6e0c8f3a-9c92-4a99-b0ff-7e6ac1cd6bf2
07536cba-6e00-4b8a-be02-9509228f4bb0	2024-12-16 17:43:14	distance	120	f9dcfa8e-95c4-498d-8a59-30233d1786b3
07536cba-6e00-4b8a-be02-9509228f4bb0	2024-12-16 17:43:14	speed	80	f9dcfa8e-95c4-498d-8a59-30233d1786b3
07536cba-6e00-4b8a-be02-9509228f4bb0	2024-12-16 17:43:19	distance	140	78ec74e7-a5dc-4673-99e3-ef26b76b2d42
07536cba-6e00-4b8a-be02-9509228f4bb0	2024-12-16 17:43:19	elevation	20	78ec74e7-a5dc-4673-99e3-ef26b76b2d42

Due to hardware limitations, the devices only have 1 second resolution, and there are (rare) instances when the device will send more than one update in the same second. In this case, I want to detect a repeated asset_id, timestamp and metric_name and have it generate a new uuid for the second update with the same second. This is the part that's giving the most grief that I can't figure out.

Solution

The uuid_generate_v3() function from the "uuid-ossp" extension lets you generate uuids in a repeatable manner. Use asset_id as the namespace and timestamp as the seed.

create extension if not exists "uuid-ossp";

To differentiate between multiple entries for the same metric_name on the same timestamp, you can add the row_number() window function to the seed.

As a select: _{demo at db<>fiddle}

select*,uuid_generate_v3( asset_id
                         ,concat(timestamp,row_number()over w))as update_id
from example_input
window w as(partition by asset_id,timestamp,metric_name
            order by ctid)

As an actual update:

alter table example_input add column update_id uuid;

update example_input as a 
set update_id=uuid_generate_v3(asset_id,concat(timestamp,( 
        select row_number()over(order by ctid)
         from example_input as c
         where a.asset_id=c.asset_id
           and a.timestamp=c.timestamp
           and a.metric_name=c.metric_name
         order by c.ctid<>a.ctid limit 1)))--only get the current row's rownum

asset_id	timestamp	metric_name	metric_value	update_id
07536cba-6e00-4b8a-be02-9509228f4bb0	2024-12-16 17:43:14	distance	100	7e88679b-5038-3124-b0e6-a4ed16316ba4
07536cba-6e00-4b8a-be02-9509228f4bb0	2024-12-16 17:43:14	speed	60	7e88679b-5038-3124-b0e6-a4ed16316ba4
07536cba-6e00-4b8a-be02-9509228f4bb0	2024-12-16 17:43:14	elevation	15	7e88679b-5038-3124-b0e6-a4ed16316ba4
07536cba-6e00-4b8a-be02-9509228f4bb0	2024-12-16 17:43:14	distance	120	d1ca0905-7fa1-3f8c-9284-2d615bdaacd3
07536cba-6e00-4b8a-be02-9509228f4bb0	2024-12-16 17:43:14	speed	80	d1ca0905-7fa1-3f8c-9284-2d615bdaacd3
07536cba-6e00-4b8a-be02-9509228f4bb0	2024-12-16 17:43:19	distance	140	40e25bd6-55eb-34cd-a8c5-262cd9d8987b
07536cba-6e00-4b8a-be02-9509228f4bb0	2024-12-16 17:43:19	elevation	20	40e25bd6-55eb-34cd-a8c5-262cd9d8987b

The ctid is a hidden system column holding each row's page and tuple number. You can still break the ties without it (remove order by ctid in both cases), but a lower ctid typically indicates an earlier entry which could make some sense in your case. The resulting uuid obfuscates that order (earlier entry won't translate to a "lower identifier") but it pairs up Nth metrics in each second: third speed for a given second ends up under the same update_id as the third elevation.