In SQL, given tables T
and S
that have the same schema, say,
CREATE TABLE "T" ("ID" INTEGER, "Salary" REAL);
CREATE TABLE "S" ("ID" INTEGER, "Salary" REAL);
we can union the two relations with the command
(SELECT * FROM "S") UNION (SELECT * FROM "T");
but the following command gives an error:
"S" UNION "T";
Error message in SQLite: near "UNION": syntax error
Error message in SQL Server: Incorrect syntax near the keyword 'UNION'.
Why can't we use the latter command? Algebraically speaking, the union operator is defined to take two table operands and output another table as a result. So "S" UNION "T"
must work. This is analogous to how the selection and projection operators take a table operand, and output another table.
I understand that while the output of a SELECT
statement should algebraically be a table, it is in practice a "virtual table". My understanding from C. J. Date's An Introduction to Database Systems (8th Ed) is that tables like S
and T
above are "base relvars/relations" whereas virtual tables like (SELECT * FROM "S")
and (SELECT * FROM "T")
are "derived relations"; and that both (base and derived) are considered tables. So why don't they act the same when UNION
operates on them?
Interestingly, after we give the names view1
and view2
to the outputs of (SELECT * FROM "S")
and (SELECT * FROM "T")
respectively to create views,
CREATE VIEW "view1" AS SELECT * FROM "S";
CREATE VIEW "view2" AS SELECT * FROM "T";
we will again get errors if we feed the views to UNION
as operands:
"view1" UNION "view2";
Error message in SQLite: near "UNION": syntax error
Error message in SQL Server: Incorrect syntax near 'view1'.
What's happening? Why must the operands necessarily be SELECT
statements?
While relational databases are built on top of relational algebra, they are not the same, especially actual implementations. Some versions of some RDBMS even don't conform to SQL standard (they are actually not required to fully support it to claim compatibility, some features are optional).
Anyway, SQL standard defines UNION operator the way you'd expect:
<non-join query expression> ::=
<non-join query term>
| <query expression> UNION [ ALL ] [ <corresponding spec> ] <query term>
if we unfold everything:
<query term>
::= <non-join query term>
<non-join query term>
::= <non-join query primary>
<non-join query primary>
::= <simple table>
<simple table>
::= <explicit table>
<explicit table>
::= TABLE <table name>
The
<explicit table>
TABLE <table name>
is equivalent to the<query expression>
( SELECT * FROM <table name> )
so, according to the standard you should be able to use syntax like:
TABLE T UNION TABLE S
but actual implementation of SQL language is up to RDBMS developers, I don't know any to support explicit table statements like this, btw, the same for UNION JOIN, so answer to your question Why should the operands for UNION be SELECT statements and not tables?
- because developers decided not to follow standard.