prometheuspromql

Is there a way to express a true "left join" in PromQL?


Is there any way to express the equivalent to the SQL "left join" operation in PromQL with vector matching?

I want to join on info metrics, but I want to retain the non-info metric even if there is no matching info-metric. This does not seem to be possible without a lot of duplication in the queries and use of or operators. Duplicating both 'branches' of the query quickly becomes unmanageable if more than one possibly-missing info metric must be handled, and/or the left-hand side is a nontrivial expression.

E.g. if I have a primary metric important_metric and an info-metric workload_priorities_info, I want to enrich important_metric with labels from workload_priorities_info if there is a matching time series. But I want to always preserve the important_metric series in the output, whether or not there's a match.

Using the PromQL convention for joining on an info metric (the info function isn't ready yet), this will return important_metric if there's a matching workload_priorities_info. But if there's no match, it omits the important_metric entirely:

important_metric{}
# produce one output series for each LHS input series for which a RHS input series
# with the same (join_labels) exists
* on (join_labels)
  # copy RHS "info_label" label values to the output series
  group_left(info_label)
  # force the value of the input series to use on the RHS to 1, and
  # summarize it to discard any irrelevant labels so we don't encounter errors
  # if there are duplicates due to e.g. workload restarts on the service emitting
  # the info-metric.
  group by (join_labels, info_label) (
    workload_priorities_info{}
  )

This is analogous to an INNER JOIN with a GROUP BY on the LHS important_metric. I cannot find a way to express this like a LEFT JOIN where left-side rows with no right-side match are retained in the output.

With inputs

important_metric{join_labels="1"} 100
important_metric{join_labels="2"} 64
important_metric{join_labels="3"} 5

workload_priorities_info{join_labels="1",info_label="highprio"}
workload_priorities_info{join_labels="3",info_label="lowprio"}

this produces

{join_labels="1",info_label="highprio"} 100
{join_labels="3",info_label="lowprio"} 5

omitting the series with join_labels="2", where I instead want:

{join_labels="1",info_label="highprio"} 100
{join_labels="2"} 64
{join_labels="3",info_label="lowprio"} 5

Note that similar-sounding question Is there a way to do a "left outer join" like query in PromQL? is actually asking for an anti-join, which is often expressed as a left join with a filter on IS NULL for the RHS. So it's not the same thing.

See also:

Surprisingly, MetricsQL from VictoriaMetrics does not appear to solve this problem.


Solution

  • Best workaround: union with "or"

    The closest I've come uses set operations to execute the query as two branches - one for the "inner join" side, and one that's an anti-join that returns only LHS rows that don't match the RHS. This is very verbose if the LHS is a nontrivial expression instead of a simple stand-alone metric, and it's really inefficient too.

    (
      important_metric{}
      * on (join_labels)
        group_left(info_label)
        group by (join_labels, info_label) (
          workload_priorities_info{}
        )
    )
    # if there's no series on the LHS with the same labels as the RHS after ignoring
    # the "info_label" label on each side, add the RHS series to the output, otherwise
    # output the LHS series.
    or ignoring (info_label)
    important_metric{}
    

    with inputs

    important_metric{join_labels="1"} 100
    important_metric{join_labels="2"} 64
    important_metric{join_labels="3"} 5
    
    workload_priorities_info{join_labels="1",info_label="highprio"}
    workload_priorities_info{join_labels="3",info_label="lowprio"}
    

    this produces

    {join_labels="1",info_label="highprio"} 100
    {join_labels="2"} 64
    {join_labels="3",info_label="lowprio"} 5
    

    Based on Proposal: PromQL arithmetic with default value, or outer join (issue#13625) this is probably the best workaround that is currently possible.

    SQL analogues for comparison

    The SQL translation of this would be something like

    # inner join to find rows with matching info metrics
    SELECT
      important_metric.value * workload_priorities_info.value,
      important_metric.join_labels,
      workload_priorities_info.info_label
    FROM important_metric
    INNER JOIN workload_priorities_info USING (join_labels)
    # append two sets of rows
    UNION ALL
    # find rows that do NOT have any matching info-metric and append them to
    # the result set
    SELECT important_metric.value,
      important_metric.join_labels,
      NULL
    FROM important_metric
    LEFT JOIN INNER JOIN workload_priorities_info USING (join_labels)
    WHERE workload_priorities_info.info_label IS NULL
    

    where what we want is the equivalent of

    SELECT
      important_metric.value * coalesce(workload_priorities_info.value, 1),
      important_metric.join_labels,
      workload_priorities_info.info_label
    FROM important_metric
    LEFT JOIN workload_priorities_info USING (join_labels)