I am wondering what the possible use cases are for the SELECT
AS STRUCT|VALUES
modifier for GoogleSQL beyond the sort of textbook examples in the documentation.
The AS STRUCT
reduces a non-scalar row into a scalar `STRUCT, so something like:
SELECT "David" AS name, 20 AS age
Could be converted into a subquery-able (scalar) item by doing something like:
SELECT (
SELECT AS STRUCT "David" AS name, 20 AS age
)
Or adding a bunch of debug info into a single (scalar) column by doing something like:
SELECT "David" AS name, 20 AS age,
(SELECT AS STRUCT 1 AS a, 2 AS b) debug
FROM (SELECT NULL)
But beyond that I can't see too much use, and I've never used it outside of trivial debugging queries myself. I was wondering if hopefully someone on the BigQuery team can explain:
The STRUCT
type is a container of ordered fields where each field has a type and name (optional). It can be used to combine multiple fields under one name.
But beyond that I can't see too much use, and I've never used it outside of trivial debugging queries myself.
Struct can be used to structure your data. For example:
CREATE TEMPORARY TABLE rectangles (
name string,
coordinates STRUCT<
topLeft STRUCT<x Int64, y int64>,
bottomRight STRUCT<x Int64, y int64>
>
);
INSERT INTO rectangles (name, coordinates) VALUES
('rectangle 1', ((10, 10), (90, 90)));
Here is the resulting table schema and data.
What are some actual examples when you use or find these two modifier keywords being used?
SELECT AS STRUCT
(and other struct constructors) create a struct from multiple values. This could be used to bypass some restrictions, for example, when the data must consist of 1 column x n rows:
SELECT ARRAY(
SELECT AS STRUCT gender, name
FROM bigquery-public-data.usa_names.usa_1910_2013
WHERE year = 2001 AND STate = 'WA'
ORDER BY number
LIMIT 5
)
Without struct, you will get the following error:
ARRAY subquery cannot have more than one column unless
using SELECT AS STRUCT to build STRUCT values
Another example where the IN clause subquery needs to return more than one column for tuple comparison:
SELECT *
FROM bigquery-public-data.usa_names.usa_1910_2013
WHERE (state, year, number) IN (
SELECT AS STRUCT state, year, MAX(number)
FROM bigquery-public-data.usa_names.usa_1910_2013
WHERE year BETWEEN 2001 AND 2003
GROUP BY state, year
)
Without struct, you will get the following error:
Subquery of type IN must have only one output column
Note that an IN clause subquery CAN return more than one column according to SQL standard (more on this below).
SELECT AS VALUE
operates on a one column select and that column must be a struct.
While the input for AS STRUCT
and AS VALUE
is different, the result for both is a value table — a one column table where that column is a struct type and has no name. You may substitute SELECT AS STRUCT
in the previous section with SELECT AS VALUE
and make necessary changes to the columns.
Do any other SQL dialects use that construction (or a similar construction) and if not, why did GoogleSQL need to support it?
The answer depends on what you're trying to do.
SELECT (1, 1, 0) < (1, 1, 1)
and WHERE (t1.a, t1.b, t1.c) IN (SELECT t2.a, t2.b, t2.c FROM t2 ...)
. Not all RDMBS implement this.