postgresqlherokudatadog

Grok parser for Heroku Postgres metric log samples in datadog?


Heroku sends Postgres metric logs as "samples" to DataDog. I want a parser that can extract the data from those logs so I can turn them into DataDog metrics. I had a pattern that was working for a while, but recently broke because Heroku added additional metrics on May 15th, 2024.


Solution

  • I figured out how to make the parser more tolerant to new fields being added (since I'm sure Heroku will do this again).

    In "Advanced Settings", add _any_sample (?:(?:\s+sample#[^=]+=[^ ]+)?)+. This matches "0 or more" instances of samples I don't explicitly call out.

    I have a few rules for the other sample types (redis, heroku memory)

    Then I use three different rules for Postgres:

    1. One for followers (has "lag commits" metric)
    2. One for the lead (without the "lag commits")
    3. One that's minimal in case they remove/change some so I don't lose everything

    The properties have to be in order, which the minimal log will hopefully accommodate if small changes happen.

    Rules

    serverMetrics source\=%{notSpace:source}\.1\s+dyno\=heroku\.*\.%{notSpace:dyno}\s+sample#load_avg_1m\=%{number:load_avg_1m}\s+sample#load_avg_5m\=%{number:load_avg_5m}\s+sample#load_avg_15m\=%{number:load_avg_15m}
    
    memRuntimeMetrics source\=%{notSpace:source}\s+dyno\=%{notSpace:dyno}\s+sample#memory_total\=%{number:memory_total}MB\s+sample#memory_rss\=%{number:memory_rss}MB\s+sample#memory_cache\=%{number:memory_cache}MB\s+sample#memory_swap\=%{number:memory_swap}MB\s+sample#memory_pgpgin\=%{number:memory_pgpgin}pages\s+sample#memory_pgpgout\=%{number:memory_pgpgout}pages\s+sample#memory_quota\=%{number:memory_quota}MB
    
    redisMetrics source\=%{notSpace:source}\s+addon\=%{notSpace:addon}\s+sample#active-connections\=%{number:active_connections}\s+sample#load-avg-1m\=%{number:load_avg_1m}\s+sample#load-avg-5m\=%{number:load_avg_5m}\s+sample#load-avg-15m\=%{number:load_avg_15m}\s+sample#read-iops\=%{number:read_iops}\s+sample#write-iops\=%{number:write_iops}\s+sample#memory-total\=%{number:memory_total}kB\s+sample#memory-free\=%{number:memory_free}kB\s+sample#memory-cached\=%{number:memory_cached}kB\s+sample#memory-redis\=%{number:memory_redis}bytes\s+sample#hit-rate\=%{number:hit_rate}\s+sample#evicted-keys\=%{number:evicted_keys}
    
    postgresFollower source\=%{notSpace:source}\s+addon\=%{notSpace:addon}%{_any_sample} sample#current_transaction\=%{number:current_transaction}%{_any_sample} sample#db_size\=%{number:db_size}bytes%{_any_sample} sample#db-max-size\=%{number:db_max_size}bytes%{_any_sample} sample#db-size-percentage-used\=%{number:db_size_percentage_used}%{_any_sample} sample#tables\=%{number:tables}%{_any_sample} sample#active-connections\=%{number:active_connections}%{_any_sample} sample#waiting-connections\=%{number:waiting_connections}%{_any_sample} sample#max-connections\=%{number:max_connections}%{_any_sample} sample#connections-percentage-used\=%{number:connections_percentage_used}%{_any_sample} sample#index-cache-hit-rate\=%{number:index_cache_hit_rate}%{_any_sample} sample#table-cache-hit-rate\=%{number:table_cache_hit_rate}%{_any_sample} sample#load-avg-1m\=%{number:load_avg_1m}%{_any_sample} sample#load-avg-5m\=%{number:load_avg_5m}%{_any_sample} sample#load-avg-15m\=%{number:load_avg_15m}%{_any_sample} sample#read-iops\=%{number:read_iops}%{_any_sample} sample#write-iops\=%{number:write_iops}%{_any_sample} sample#max-iops\=%{number:max_iops}%{_any_sample} sample#iops-percentage-used\=%{number:iops_percentage_used}%{_any_sample} sample#tmp-disk-used\=%{number:tmp_disk_used}%{_any_sample} sample#tmp-disk-available\=%{number:tmp_disk_available}%{_any_sample} sample#memory-total\=%{number:memory_total}kB%{_any_sample} sample#memory-free\=%{number:memory_free}kB%{_any_sample} sample#memory-percentage-used\=%{number:memory_percentage_used}%{_any_sample} sample#memory-cached\=%{number:memory_cached}kB%{_any_sample} sample#memory-postgres\=%{number:memory_postgres}kB%{_any_sample} sample#follower-lag-commits\=%{number:follower_lag_commits}%{_any_sample} sample#wal-percentage-used\=%{number:wal_percentage_used}%{_any_sample} sample#rollback-from\=%{date("yyyy-MM-dd'T'HH:mmz"):rollback_from}%{_any_sample}
    
    postgresLead source\=%{notSpace:source}\s+addon\=%{notSpace:addon}%{_any_sample} sample#current_transaction\=%{number:current_transaction}%{_any_sample} sample#db_size\=%{number:db_size}bytes%{_any_sample} sample#db-max-size\=%{number:db_max_size}bytes%{_any_sample} sample#db-size-percentage-used\=%{number:db_size_percentage_used}%{_any_sample} sample#tables\=%{number:tables}%{_any_sample} sample#active-connections\=%{number:active_connections}%{_any_sample} sample#waiting-connections\=%{number:waiting_connections}%{_any_sample} sample#max-connections\=%{number:max_connections}%{_any_sample} sample#connections-percentage-used\=%{number:connections_percentage_used}%{_any_sample} sample#index-cache-hit-rate\=%{number:index_cache_hit_rate}%{_any_sample} sample#table-cache-hit-rate\=%{number:table_cache_hit_rate}%{_any_sample} sample#load-avg-1m\=%{number:load_avg_1m}%{_any_sample} sample#load-avg-5m\=%{number:load_avg_5m}%{_any_sample} sample#load-avg-15m\=%{number:load_avg_15m}%{_any_sample} sample#read-iops\=%{number:read_iops}%{_any_sample} sample#write-iops\=%{number:write_iops}%{_any_sample} sample#max-iops\=%{number:max_iops}%{_any_sample} sample#iops-percentage-used\=%{number:iops_percentage_used}%{_any_sample} sample#tmp-disk-used\=%{number:tmp_disk_used}%{_any_sample} sample#tmp-disk-available\=%{number:tmp_disk_available}%{_any_sample} sample#memory-total\=%{number:memory_total}kB%{_any_sample} sample#memory-free\=%{number:memory_free}kB%{_any_sample} sample#memory-percentage-used\=%{number:memory_percentage_used}%{_any_sample} sample#memory-cached\=%{number:memory_cached}kB%{_any_sample} sample#memory-postgres\=%{number:memory_postgres}kB%{_any_sample} sample#wal-percentage-used\=%{number:wal_percentage_used}%{_any_sample} sample#rollback-from\=%{date("yyyy-MM-dd'T'HH:mmz"):rollback_from}%{_any_sample}
    
    postgresMinimal source\=%{notSpace:source}\s+addon\=%{notSpace:addon}%{_any_sample} sample#current_transaction\=%{number:current_transaction}%{_any_sample} sample#db_size\=%{number:db_size}bytes%{_any_sample} sample#db-max-size\=%{number:db_max_size}bytes%{_any_sample} sample#db-size-percentage-used\=%{number:db_size_percentage_used}%{_any_sample} sample#tables\=%{number:tables}%{_any_sample} sample#active-connections\=%{number:active_connections}%{_any_sample} sample#waiting-connections\=%{number:waiting_connections}%{_any_sample} sample#max-connections\=%{number:max_connections}%{_any_sample} sample#connections-percentage-used\=%{number:connections_percentage_used}%{_any_sample} sample#index-cache-hit-rate\=%{number:index_cache_hit_rate}%{_any_sample} sample#table-cache-hit-rate\=%{number:table_cache_hit_rate}%{_any_sample} sample#load-avg-1m\=%{number:load_avg_1m}%{_any_sample} sample#load-avg-5m\=%{number:load_avg_5m}%{_any_sample} sample#load-avg-15m\=%{number:load_avg_15m}%{_any_sample} sample#read-iops\=%{number:read_iops}%{_any_sample} sample#write-iops\=%{number:write_iops}%{_any_sample} sample#max-iops\=%{number:max_iops}%{_any_sample} sample#iops-percentage-used\=%{number:iops_percentage_used}%{_any_sample} sample#tmp-disk-used\=%{number:tmp_disk_used}%{_any_sample} sample#tmp-disk-available\=%{number:tmp_disk_available}%{_any_sample} sample#memory-total\=%{number:memory_total}kB%{_any_sample} sample#memory-free\=%{number:memory_free}kB%{_any_sample} sample#memory-percentage-used\=%{number:memory_percentage_used}%{_any_sample} sample#memory-cached\=%{number:memory_cached}kB%{_any_sample} sample#memory-postgres\=%{number:memory_postgres}kB%{_any_sample} sample#wal-percentage-used\=%{number:wal_percentage_used}%{_any_sample}