sqlgoogle-cloud-platformgoogle-bigquery

BigQuery: How to call a stored procedure in where clause?


I have a stored procedure compare() that compares two strings and returns true or false. I want to use this procedure in aggregation like in the following statement:

SELECT * 
FROM myTable AS t1
INNER JOIN myTable2 AS t2
WHERE compare(t1.column1, t2.column2)

But I'm getting the following error:

Procedure is not supported in this context

Is there any hack to call stored procedure in the aggregation of a SQL statement?


Solution

  • According to the documentation, a Stored Procedure is used to execute a wide range of tasks, such running multiple queries in a sequence. Therefore, you can not use it within a WHERE clause. Also, in order to invoke a Stored procedure you have to use CALL `project_id.dataset.name`.

    You could use your stored to already perform the aggregation. It would be as follows,

    CREATE OR REPLACE PROCEDURE `project_id.dataset.compare`()
    BEGIN
      SELECT * 
      FROM myTable AS t1
      INNER JOIN myTable2 AS t2 ON t1.pk = t2.pk
      WHERE t1.column1 = t2.column2
    END;
    

    After creating the produce, invoke it (run it),

    CALL `project_id.dataset.compare`();
    

    Notice that the aggregation is done within the procedure. Thus, you do not need to use any other query, simply call your procedure to perform all the actions you needed.

    On the other hand, you could also use User Defined Functions(UDF), as Mykhail Beyrlant suggested. You can write a SQL UDF or a JavaScript UDF. Below there is one example for each,

    1) Using a SQL UDF.

    #creating the UDF
    CREATE TEMP FUNCTION compare(x STRING, y STRING) AS (
      IF(x=y,True, False) 
    );
    #sample data
    with data as (
    SELECT 'Ralph' as name, 12 as id UNION ALL
    SELECT 'Mary' as name, 13 as id 
    ), data2 as (
    SELECT 'Ralph' as name, 12 as id UNION ALL
    SELECT 'Robert' as name, 14 as id 
    )
    
    SELECT t1.name, t1.id FROM data t1 INNER JOIN data2 t2 USING(id)
    WHERE compare(t1.name,t2.name)
    

    And the output,

    Row name    id  
    1   Ralph   12
    

    2) Using a JavaScript UDF. It is generally used when the logic is more complex and you can not easily achieve it using SQL.

    #creating the UDF
    CREATE TEMP FUNCTION compare(x STRING, y STRING)
      RETURNS BOOL
      LANGUAGE js AS """
          if(x=y){return true;} else{return false;}
    """;  
    #sample data
    with data as (
    SELECT 'Ralph' as name, 12 as id UNION ALL
    SELECT 'Mary' as name, 13 as id 
    ), data2 as (
    SELECT 'Ralph' as name, 12 as id UNION ALL
    SELECT 'Robert' as name, 14 as id 
    )
    
    SELECT t1.name, t1.id FROM data t1 INNER JOIN data2 t2 USING(id)
    WHERE compare(t1.name,t2.name)
    

    And the output,

    Row name    id  
    1   Ralph   12
    

    Notice that both UDF return the same output. Moreover, you can choose from any of the three methods I exemplified above.