sqlgoogle-bigquerygoogle-cloud-spanner

SELECT AS STRUCT/VALUES


I am wondering what the possible use cases are for the SELECT AS STRUCT|VALUES modifier for GoogleSQL beyond the sort of textbook examples in the documentation.

The AS STRUCT reduces a non-scalar row into a scalar `STRUCT, so something like:

SELECT "David" AS name, 20 AS age

Could be converted into a subquery-able (scalar) item by doing something like:

SELECT  (
    SELECT AS STRUCT "David" AS name, 20 AS age
)

Or adding a bunch of debug info into a single (scalar) column by doing something like:

SELECT "David" AS name, 20 AS age, 
    (SELECT AS STRUCT 1 AS a, 2 AS b) debug
FROM (SELECT NULL)

But beyond that I can't see too much use, and I've never used it outside of trivial debugging queries myself. I was wondering if hopefully someone on the BigQuery team can explain:


Solution

  • The STRUCT type is a container of ordered fields where each field has a type and name (optional). It can be used to combine multiple fields under one name.

    But beyond that I can't see too much use, and I've never used it outside of trivial debugging queries myself.

    Struct can be used to structure your data. For example:

    CREATE TEMPORARY TABLE rectangles (
      name string,
      coordinates STRUCT<
        topLeft STRUCT<x Int64, y int64>,
        bottomRight STRUCT<x Int64, y int64>
      >
    );
    
    INSERT INTO rectangles (name, coordinates) VALUES
    ('rectangle 1', ((10, 10), (90, 90)));
    

    Here is the resulting table schema and data.

    What are some actual examples when you use or find these two modifier keywords being used?

    SELECT AS STRUCT (and other struct constructors) create a struct from multiple values. This could be used to bypass some restrictions, for example, when the data must consist of 1 column x n rows:

    SELECT ARRAY(
      SELECT AS STRUCT gender, name
      FROM bigquery-public-data.usa_names.usa_1910_2013
      WHERE year = 2001 AND STate = 'WA'
      ORDER BY number
      LIMIT 5
    )
    

    Without struct, you will get the following error:

    ARRAY subquery cannot have more than one column unless
    using SELECT AS STRUCT to build STRUCT values
    

    Another example where the IN clause subquery needs to return more than one column for tuple comparison:

    SELECT *
    FROM bigquery-public-data.usa_names.usa_1910_2013
    WHERE (state, year, number) IN (
      SELECT AS STRUCT state, year, MAX(number)
      FROM bigquery-public-data.usa_names.usa_1910_2013
      WHERE year BETWEEN 2001 AND 2003
      GROUP BY state, year
    )
    

    Without struct, you will get the following error:

    Subquery of type IN must have only one output column
    

    Note that an IN clause subquery CAN return more than one column according to SQL standard (more on this below).

    SELECT AS VALUE operates on a one column select and that column must be a struct.

    While the input for AS STRUCT and AS VALUE is different, the result for both is a value table — a one column table where that column is a struct type and has no name. You may substitute SELECT AS STRUCT in the previous section with SELECT AS VALUE and make necessary changes to the columns.

    Do any other SQL dialects use that construction (or a similar construction) and if not, why did GoogleSQL need to support it?

    The answer depends on what you're trying to do.