Is there any way to express the equivalent to the SQL "left join" operation in PromQL with vector matching?
I want to join on info metrics, but I want to retain the non-info metric even if there is no matching info-metric. This does not seem to be possible without a lot of duplication in the queries and use of or
operators. Duplicating both 'branches' of the query quickly becomes unmanageable if more than one possibly-missing info metric must be handled, and/or the left-hand side is a nontrivial expression.
E.g. if I have a primary metric important_metric
and an info-metric workload_priorities_info
, I want to enrich important_metric
with labels from workload_priorities_info
if there is a matching time series. But I want to always preserve the important_metric
series in the output, whether or not there's a match.
Using the PromQL convention for joining on an info metric (the info function isn't ready yet), this will return important_metric
if there's a matching workload_priorities_info
. But if there's no match, it omits the important_metric
entirely:
important_metric{}
# produce one output series for each LHS input series for which a RHS input series
# with the same (join_labels) exists
* on (join_labels)
# copy RHS "info_label" label values to the output series
group_left(info_label)
# force the value of the input series to use on the RHS to 1, and
# summarize it to discard any irrelevant labels so we don't encounter errors
# if there are duplicates due to e.g. workload restarts on the service emitting
# the info-metric.
group by (join_labels, info_label) (
workload_priorities_info{}
)
This is analogous to an INNER JOIN
with a GROUP BY
on the LHS important_metric
. I cannot find a way to express this like a LEFT JOIN
where left-side rows with no right-side match are retained in the output.
With inputs
important_metric{join_labels="1"} 100
important_metric{join_labels="2"} 64
important_metric{join_labels="3"} 5
workload_priorities_info{join_labels="1",info_label="highprio"}
workload_priorities_info{join_labels="3",info_label="lowprio"}
this produces
{join_labels="1",info_label="highprio"} 100
{join_labels="3",info_label="lowprio"} 5
omitting the series with join_labels="2"
, where I instead want:
{join_labels="1",info_label="highprio"} 100
{join_labels="2"} 64
{join_labels="3",info_label="lowprio"} 5
Note that similar-sounding question Is there a way to do a "left outer join" like query in PromQL? is actually asking for an anti-join, which is often expressed as a left join with a filter on IS NULL
for the RHS. So it's not the same thing.
See also:
Surprisingly, MetricsQL from VictoriaMetrics does not appear to solve this problem.
The closest I've come uses set operations to execute the query as two branches - one for the "inner join" side, and one that's an anti-join that returns only LHS rows that don't match the RHS. This is very verbose if the LHS is a nontrivial expression instead of a simple stand-alone metric, and it's really inefficient too.
(
important_metric{}
* on (join_labels)
group_left(info_label)
group by (join_labels, info_label) (
workload_priorities_info{}
)
)
# if there's no series on the LHS with the same labels as the RHS after ignoring
# the "info_label" label on each side, add the RHS series to the output, otherwise
# output the LHS series.
or ignoring (info_label)
important_metric{}
with inputs
important_metric{join_labels="1"} 100
important_metric{join_labels="2"} 64
important_metric{join_labels="3"} 5
workload_priorities_info{join_labels="1",info_label="highprio"}
workload_priorities_info{join_labels="3",info_label="lowprio"}
this produces
{join_labels="1",info_label="highprio"} 100
{join_labels="2"} 64
{join_labels="3",info_label="lowprio"} 5
Based on Proposal: PromQL arithmetic with default value, or outer join (issue#13625) this is probably the best workaround that is currently possible.
The SQL translation of this would be something like
# inner join to find rows with matching info metrics
SELECT
important_metric.value * workload_priorities_info.value,
important_metric.join_labels,
workload_priorities_info.info_label
FROM important_metric
INNER JOIN workload_priorities_info USING (join_labels)
# append two sets of rows
UNION ALL
# find rows that do NOT have any matching info-metric and append them to
# the result set
SELECT important_metric.value,
important_metric.join_labels,
NULL
FROM important_metric
LEFT JOIN INNER JOIN workload_priorities_info USING (join_labels)
WHERE workload_priorities_info.info_label IS NULL
where what we want is the equivalent of
SELECT
important_metric.value * coalesce(workload_priorities_info.value, 1),
important_metric.join_labels,
workload_priorities_info.info_label
FROM important_metric
LEFT JOIN workload_priorities_info USING (join_labels)