databasetime-seriesquestdb

Get First Write Wins deduplication per minute on QuestDB


I have a table capturing data from different sensors. The sensors only send data when there is a change over a threshold, but until the readings get stable again, the sensor can be very noisy, sending intermediate values we don't need. I would like to keep only the first read from each sensor per minute, and don't care about the reads for the same sensor until at least one minute later.

I've been trying to use QuestDB deduplication, but this will not work because of two reasons:

This is my table

CREATE TABLE 'temp_sensors' ( 
    MeasurementTimestamp TIMESTAMP,
    DoorName SYMBOL CAPACITY 750 CACHE,
    Temperature DOUBLE, 
    BatteryLife DOUBLE  
) timestamp(MeasurementTimestamp) PARTITION BY MONTH WAL
DEDUP UPSERT KEYS(MeasurementTimestamp,DoorName);

And this is a sample of reading from a sensor over 4 minutes. I would like to keep only 4 readings for that sensor and for that period.

"MeasurementTimestamp","DoorName","Temperature","BatteryLife"
"2025-02-25T17:50:55.480253Z","A000",22.0,77.0
"2025-02-25T17:51:05.798212Z","A000",16.0,77.0
"2025-02-25T17:51:19.053723Z","A000",16.3,77.0
"2025-02-25T17:51:24.720257Z","A000",16.6,77.0
"2025-02-25T17:51:29.267525Z","A000",16.9,77.0
"2025-02-25T17:51:34.876367Z","A000",17.2,77.0
"2025-02-25T17:51:39.840709Z","A000",17.5,77.0
"2025-02-25T17:51:44.112376Z","A000",17.8,77.0
"2025-02-25T17:51:49.772328Z","A000",18.1,77.0
"2025-02-25T17:51:54.214615Z","A000",18.4,77.0
"2025-02-25T17:51:58.996547Z","A000",18.7,77.0
"2025-02-25T17:52:06.004298Z","A000",19.0,77.0
"2025-02-25T17:52:15.529426Z","A000",19.3,76.0
"2025-02-25T17:52:20.132906Z","A000",19.6,76.0
"2025-02-25T17:52:23.703018Z","A000",19.9,76.0
"2025-02-25T17:52:29.206239Z","A000",20.2,76.0
"2025-02-25T17:52:35.720668Z","A000",20.5,76.0
"2025-02-25T17:52:43.671002Z","A000",20.8,76.0
"2025-02-25T17:52:49.391459Z","A000",21.1,76.0
"2025-02-25T17:52:58.643473Z","A000",21.4,76.0
"2025-02-25T17:53:04.187488Z","A000",21.7,76.0
"2025-02-25T17:53:11.118511Z","A000",22.0,76.0

Solution

  • The best solution is probably using the original table as a "staging area" and then create a materialized view to deduplicate the events into a second table:

    create materialized view sensors_per_minute AS (
      SELECT 
             MeasurementTimestamp, 
             DoorName, 
             FIRST(Temperature) AS Temperature, 
             FIRST(BatteryLife) AS BatteryLife, 
             -- FIRST(MeasurementTimestamp) AS timestamp
    FROM temp_sensors SAMPLE BY 1m
    ) PARTITION BY MONTH;
    

    If the original timestamp is important, it is possible to get it by using FIRST(MeasurementTimestamp). I left it commented on the query as I am fine with having every row starting at second 00, but leaving there in case it helps.

    With this materialized view and the sampled data, I have now in this table

    SELECT * from sensors_per_minute;
    
    "MeasurementTimestamp","DoorName","Temperature","BatteryLife","timestamp"
    "2025-02-25T17:50:00.000000Z","A000",22.0,77.0,"2025-02-25T17:50:55.480253Z"
    "2025-02-25T17:51:00.000000Z","A000",16.0,77.0,"2025-02-25T17:51:05.798212Z"
    "2025-02-25T17:52:00.000000Z","A000",19.0,77.0,"2025-02-25T17:52:06.004298Z"
    "2025-02-25T17:53:00.000000Z","A000",21.7,76.0,"2025-02-25T17:53:04.187488Z"
    

    It would make sense to ALTER TABLE and set a TTL on the original table, so data is automatically expired after a short period and only the materialized data is kept.