I have some processes ingesting data on QuestDB and I can see some spikes on my disk usage on the server. I am trying to understand which tables are affecting the spikes and found out that on QuestDB I can calculate write amplification like this
questdb_physically_written_rows_total / questdb_committed_rows_total
So I am calling the metrics endpoint and this is the information I get
curl -G "http://localhost:9003/metrics"|grep -b2 -a2 _rows_
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 9252 0 9252 0 0 68141 0 --:--:-- --:--:-- --:--:-- 68533
1441-questdb_o3_commits_total 4
1468-
1469:# TYPE questdb_committed_rows_total counter
1513:questdb_committed_rows_total 1000098
1550-
1551-# TYPE questdb_rollbacks_total counter
1590-questdb_rollbacks_total 0
1616-
1617:# TYPE questdb_physically_written_rows_total counter
1670:questdb_physically_written_rows_total 2382248
1716-
1717:# TYPE questdb_wal_apply_physically_written_rows_total counter
1780:questdb_wal_apply_physically_written_rows_total 2382154
1836-
1837:# TYPE questdb_wal_apply_written_rows_total counter
1889:questdb_wal_apply_written_rows_total 1000004
1934-
1935:# TYPE questdb_wal_apply_rows_per_second gauge
1982:questdb_wal_apply_rows_per_second 506200
2023-
2024:# TYPE questdb_wal_written_rows_total counter
2070:questdb_wal_written_rows_total 1000004
2109-
2110-# TYPE questdb_memory_tag_MMAP_DEFAULT gauge
I understand with this I can calculate the write amplification at any point in time, but since these metrics are not persisted I would need to configure prometheus to capture the metrics, and in any case those are aggregated for the whole database, but I want to get the metrics per table.
I checked the docs and cannot find anything to per-table metrics, but I see there is some telemetry table where apparently data should be available.
Any hints how to use that to find out telemetry per table?
There are two tables we can use for getting table-level stats, the sys.telemetry_wal
table and the tables()
virtual table.
The sys.telemetry_wal
table has metrics for the rowCount and physicalRowCount, and uses a tableId to identify the table. By joining against tables()
we can get details from the parent table.
The telemetry table keeps tracks of different event types, and the one we are interested on is WAL TXT Data Apply
, which is event type 105 We can write a query like:
select t.table_name, created, tableId, sum(physicalRowCount) physicalRowCount, sum(rowCount) rowCount, sum(physicalRowCount) / sum(rowCount) as amplification, count() as transactions
from sys.telemetry_wal tw
join tables() t on t.id = tw.tableId
and event = 105
where created >= dateadd('d', -1, now())
SAMPLE BY 1h
ORDER BY amplification desc, created desc;
curl -G "http://localhost:9000/exp" --data-urlencode "query=select t.table_name, created, tableId, sum(physicalRowCount) physicalRowCount, sum(rowCount) rowCount, sum(physicalRowCount) / sum(rowCount) as amplification, count() as transactions
from sys.telemetry_wal tw
join tables() t on t.id = tw.tableId
and event = 105
where created >= dateadd('d', -10, now())
SAMPLE BY 1h
ORDER BY amplification desc, created desc;"
And I can see the tables causing the most amplification
"table_name","created","tableId","physicalRowCount","rowCount","amplification","transactions"
"UST_trades","2025-02-26T12:00:00.000000Z",62,51047982,820,62253,187
"UST_prices","2025-02-26T12:00:00.000000Z",61,3262798,117,27887,117
"iot_data","2025-02-26T12:00:00.000000Z",66,3857148,1000000,3,21
"iot_data","2025-02-26T11:00:00.000000Z",66,4239298,2000000,2,42
"UST_prices","2025-02-26T11:00:00.000000Z",61,851812,365504,2,61
"UST_trades","2025-02-26T11:00:00.000000Z",62,5964671,2559678,2,75
"prometheus","2025-02-26T10:00:00.000000Z",7,4,4,1,1
"prometheus","2025-02-18T09:00:00.000000Z",7,6,6,1,1
"Treasury_Securities","2025-02-26T12:00:00.000000Z",60,200,0,,1
"Treasury_Securities","2025-02-26T11:00:00.000000Z",60,400,0,,2