databasetime-seriesquestdbchange-data-capture

Get notified about changes on QuestDB Table


What's the best way to get notified about changes to my QuestDB tables? I've searched the docs but all I can find are references to QuestDB being the destination of Change Data Capture. What I want is the other way around. I would like to know when data is coming into QuestDB.

At the moment I am doing my version of a long poll, and I call select * from table where timestamp > latest_known_timestamp every few seconds. But that has some downsides, like not knowing if data has been updated/upserted rather than appended, or in the rare cases data is coming late and it arrives after my polling interval already passed.


Solution

  • While not really CDC, QuestDB has an undocumented function that can help with this: wal_transactions('table_name').

    This function provides a breakdown of the latest database transactions. This includes some key fields that may be useful.

    sequencerTxn
    timestamp
    walId
    segmentId
    segmentTxn
    structureVersion
    minTimestamp
    maxTimestamp
    rowCount
    alterCommandType
    

    By default, the minTimestamp, maxTimestamp, and rowCoun fields will be always null. To correct this, we can set the following config:

    cairo.default.sequencer.part.txn.count=180000

    This sets how many transactions are retained in that table, before it rolls over. When the number exceeds 180k, the table will be truncated. A day has 86400 seconds, so depending on how often you send transactions to QuestDB, the example 180K number might be too big/small. You can adjust this number as needed.

    Your application can then read this table, and use the metadata to know when rows have been inserted into the database. If you only need to know that rows were inserted, and how many, you don't need anything else. If you also want to get the data from the new rows, you can always select from the original table. Note that this will work only when data is appended and you don't have out of order data or updates.

    If you do have out of order data or updates, you will still know in which timestamp range the changes were made (via minTimestamp and maxTimestamp) and how many rows were affected (rowCount).

    One workaround for this is to add an auditing timestamp i.e updated_at or similar to your table. This can be set on the application side by sending the current timestamp, or using now() as part of a Postgres INSERT statement.

    When pulling new rows, you can then pull the range defined by the min and max timestamps, and then filter them by this audit timestamp.