sqlpostgresqlgoogle-bigquerygoogle-cloud-platform

Correctly Migrate Postgres least() Behavior to BigQuery


I'm attempting to migrate postgres scripts over to bigquery, with the end goal of both scripts returning the exact same tables (schema and values).

I'm running into an issue when trying to replicate the behavior of least() in postgres in my bigquery selects.

In postgres, if any parameters of the least() call are null, they are skipped and the least non-null value is returned. In bigquery, however, if any of the parameters of the least() call are null, the function automatically returns null.

I'm looking for an elegant solution to replicate the postgres least() behavior in bigquery. My current—clunky—solution is below:

Postgres (returns -1):

SELECT LEAST(1, 0, -1, null)

BigQuery (returns null):

SELECT LEAST(1, 0, -1, null)

Postgres (returns -1):

SELECT LEAST(COALESCE(1, 0, -1, null),
             COALESCE(0, 1, -1, null),
             COALESCE(-1, 0, 1, null),
             COALESCE(null, 0, -1, 1))

BigQuery (returns -1):

SELECT LEAST(COALESCE(1, 0, -1, null),
             COALESCE(0, 1, -1, null),
             COALESCE(-1, 0, 1, null),
             COALESCE(null, 0, -1, 1))

This works but is a less-than-ideal solution.

In the original postgres script I need to migrate, there is nested logic like least(w, x, least(y, z)) so that fix gets exponentially more unreadable as the number of values/complexity grows. That same issue applies when you try to do this as a massive CASE block.

If anyone has an obvious fix that I'm missing or a more elegant way to mirror the postgres behavior in bigquery, it is much appreciated!


Solution

  • There is a simple workaround for BigQuery Standard SQL

    You just create your own function (let's say myLeast)
    It works for "standalone" as well as in nested scenario

    #standardSQL
    CREATE TEMP FUNCTION myLeast(x ARRAY<INT64>) AS
    ((SELECT MIN(y) FROM UNNEST(x) AS y));
    SELECT 
      LEAST(1, 0, -1, NULL) AS least_standard,  
      LEAST(COALESCE(1, 0, -1, NULL),
        COALESCE(0, 1, -1, NULL),
        COALESCE(-1, 0, 1, NULL),
        COALESCE(NULL, 0, -1, 1)) AS least_less_than_ideal,
      myLeast([1, 0, -1, NULL]) AS least_workaround,
      myLeast([1, 0, -1, NULL, myLeast([2, 0, -2, NULL])]) AS least_with_nested  
    

    Output is

    least_standard  least_less_than_ideal   least_workaround    least_with_nested    
    null            -1                      -1                  -2   
    

    first two is from your question - third and forth are "standalone" and nested workaround

    Hope you can apply this approach to your specific case