I was wondering if someone can explain this to me and would be great if I can also get a source for it.
USE AP;
SELECT VendorName, VendorState
FROM Vendors
WHERE VendorState = 'CA'
UNION
SELECT VendorName, 'Outside CA'
FROM Vendors
WHERE VendorState <> 'CA'
Order BY VendorName
I can almost explain this in English, I know this query is saying
Select VendorName and VendorState from Vendors table
Show all vendorStates that are in California
join with
And this is where I get confused:
Select vendorName, 'outside CA'
--Other than 'Outside Ca' being a string what else is it, is it a variable? And how does it know to replace the state value with 'Outside Ca' if vendorState value is not California.
There are 2 factors coming into play here:
UNION
adds multiple resultsets together (two in this case) i.e. takes the rows returned from multiple queries and combines them into a single resultset. As an aside you should use UNION ALL
because it performs better. Note union is not a join and has nothing in common with a join. Joins are horizontal, unions are vertical. All queries in a union must have the same number columns with the same datatype. However you only need to alias the columns in the first query. I have aliased the second below so it can be run without the union.
You can select static data as part of a query. 'Outside CA'
is static, i.e. is returned as that for every row in that query, because as you have noted, we are excluding CA in that query.
So you have 2 queries, the first is for state CA and the second is for states not CA, and for the second query instead of showing the state, it is showing the static value 'Outside CA'. These 2 queries are unioned together to give a single resultset.
The best way to understand is to create an example - I highly recommend this for your future understanding e.g.
DECLARE @Vendors TABLE (VendorName VARCHAR(32), VendorState CHAR(2));
INSERT INTO @Vendors (VendorName, VendorState)
VALUES
('Test 1','CA'),
('Test 2','CA'),
('Test 3','NY'),
('Test 4','NY');
SELECT VendorName, VendorState
FROM @Vendors
WHERE VendorState = 'CA'
UNION ALL
SELECT VendorName, 'Outside CA' VendorState
FROM @Vendors
WHERE VendorState <> 'CA'
ORDER BY VendorName;
If you remove the union you get the results from the 2 queries e.g.
VendorName | VendorState |
---|---|
Test 1 | CA |
Test 2 | CA |
VendorName | VendorState |
---|---|
Test 3 | Outside CA |
Test 4 | Outside CA |
Then add back the union and you get see the effect:
VendorName | VendorState |
---|---|
Test 1 | CA |
Test 2 | CA |
Test 3 | Outside CA |
Test 4 | Outside CA |