clickhouse

Keep track of first row in ClickHouse


I have a table with users actions:

user_id action timestamp
1 PAGEVIEW 00:01
1 PAGEVIEW 00:02
1 PAY 00:05

How do I create a materialized view to have timestamps of first user's appearance (when first action was made)?

In this case, the result should be 1 (user_id), 00:01 (first action)

It is like ReplacingMergeTree backwards, I need to keep track of the first, not latest row, in my ordering.


Solution

  • You may use a version column with calculated expression which value is higher on a lower timestamp argument.

    CREATE TABLE tab
    (
        user_id     UInt32
      , action      LowCardinality(String)
      , timestamp   DateTime
    ) Engine = MergeTree()
    ORDER BY user_id;
    
    CREATE TABLE tab_rmt
    (
        user_id     UInt32
      , action      LowCardinality(String)
      , timestamp   DateTime
      , __ver       UInt32
    ) Engine = ReplacingMergeTree(__ver)
    ORDER BY user_id;
    
    CREATE MATERIALIZED VIEW tab_mv TO tab_rmt AS
    SELECT *, 4294967295 - toUnixTimestamp(timestamp) AS __ver
    FROM tab;
    
    INSERT INTO tab
    SELECT *
    FROM VALUES 
    (
        'user_id UInt32
      , action  LowCardinality(String)
      , timestamp   DateTime'
    , (1, 'PAGEVIEW', toDateTime('2025-01-01 00:01:00'))
    , (1, 'PAGEVIEW', toDateTime('2025-01-01 00:02:00'))
    , (1, 'PAY',      toDateTime('2025-01-01 00:03:00'))
    );
    
    SELECT * FROM tab_rmt FINAL FORMAT Vertical;
    

    The result is:

    Row 1:
    ──────
    user_id:   1
    action:    PAGEVIEW
    timestamp: 2025-01-01 00:01:00
    __ver:     2559277635 -- 2.56 billion
    

    fiddle