sqljoingoogle-bigquery

conditional join in bigquery


I have two tables.

Table 1 is a single COLUMN of integers.

Table 2 has three COLUMNS : start_integer, end_integer, data

The simple query is to join the column of integers with data where

  integer >= start_integer AND integer <= end_integer

In many SQL implementations this can be accomplished with a left conditional JOIN ... ON BETWEEN

SELECT tbl1.integer, tbl2.data FROM tbl1
LEFT JOIN tbl2 ON tbl1.integer BETWEEN tbl2.start_integer AND 
tbl2.end_integer;

But it seems BigQuery supports only JOIN ON with only an = condition.

This could be accomplished with a cross join, but BigQuery complains that my tables are too big. CROSS JOIN EACH is invalid.

How can I accomplish this join task within the limitations of BigQuery's SQL?

Below is my BigQuery SQL:

SELECT tbl1.integer, tbl2.data
FROM bq:data.tbl1 
CROSS JOIN bq:data.tbl2
WHERE tbl1.integer BETWEEN tbl2.start_integer AND tbl2.end_integer;

Which returns the error:

Error: 4.1 - 4.132: The JOIN operator's right-side table must be a small table. Switch the tables if the left-side table is smaller, or use JOIN EACH if both tables are larger than the maximum described at https://cloud.google.com/bigquery/docs/reference/legacy-sql#joins.


Solution

  • Good news (2016)! BigQuery does support inequality joins now - make sure to uncheck the "use legacy SQL option".

    Example query:

    SELECT * 
    FROM (
      SELECT 1 x
    ) a JOIN (
      SELECT 2 y
    ) b
    ON a.x<b.y
    

    With legacy SQL:

    Error: ON clause must be AND of = comparisons of one field name from each table, ...
    

    With standard SQL:

    1     2
    

    enter image description here