prometheusopentsdbthanos

Prometheus short term label usage


I have a Prometheus alert that fires if a user sessions causing a lock on a resource. The problem is, sometimes multiple locks occur from different users which are never the same. If I add the user label which will only exist for a short time, will this still cause high cardinality, or it not an issue since the labels are never the same user? My data is stored for 2d in Prometheus and long term in Thanos storage. I usually get 2 usernames a day, so I'd be adding two new time series a day.


Solution

  • A time series is uniquely identified by its name plus all its label="value" pairs. So, if you add user="some_value" label to existing time series, then new time series are created every time when some_value changes, even if only a single sample is stored for the new time series (e.g. there are no short-term labels in Prometheus).

    If you are going to add user="..." label to 10 metrics, e.g.:

    metric_1{user="..."}
    metric_2{user="..."}
    ...
    metric_10{user="..."}
    

    Then 10 new time series are created with each new user value. If you have 2 new users per day, then 10*2 = 20 new time series will be created per day. This is known as churn rate.

    20 new time series per day is very small churn rate for Prometheus, so it is safe to use user label in this case. But if the user label presents in 10K metrics and the user label gets 1K new values per day, then the daily churn rate would be 10K * 1K = 10M new time series per day. This is quite big number (known as high churn rate) and it can cause high memory usage and slow queries at Prometheus side.