sqlsql-servert-sql

Understanding union query


I was wondering if someone can explain this to me and would be great if I can also get a source for it.

USE AP;

SELECT VendorName, VendorState
FROM Vendors
WHERE VendorState = 'CA'
UNION
SELECT VendorName, 'Outside CA'
FROM Vendors
WHERE VendorState <> 'CA'
Order BY VendorName

I can almost explain this in English, I know this query is saying

Select VendorName and VendorState from Vendors table
Show all vendorStates that are in California
join with

And this is where I get confused:

Select vendorName, 'outside CA' --Other than 'Outside Ca' being a string what else is it, is it a variable? And how does it know to replace the state value with 'Outside Ca' if vendorState value is not California.


Solution

  • There are 2 factors coming into play here:

    1. UNION adds multiple resultsets together (two in this case) i.e. takes the rows returned from multiple queries and combines them into a single resultset. As an aside you should use UNION ALL because it performs better. Note union is not a join and has nothing in common with a join. Joins are horizontal, unions are vertical. All queries in a union must have the same number columns with the same datatype. However you only need to alias the columns in the first query. I have aliased the second below so it can be run without the union.

    2. You can select static data as part of a query. 'Outside CA' is static, i.e. is returned as that for every row in that query, because as you have noted, we are excluding CA in that query.

    So you have 2 queries, the first is for state CA and the second is for states not CA, and for the second query instead of showing the state, it is showing the static value 'Outside CA'. These 2 queries are unioned together to give a single resultset.

    The best way to understand is to create an example - I highly recommend this for your future understanding e.g.

    DECLARE @Vendors TABLE (VendorName VARCHAR(32), VendorState CHAR(2));
    
    INSERT INTO @Vendors (VendorName, VendorState)
    VALUES 
    ('Test 1','CA'),
    ('Test 2','CA'),
    ('Test 3','NY'),
    ('Test 4','NY');
    
    SELECT VendorName, VendorState
    FROM @Vendors
    WHERE VendorState = 'CA'
    UNION ALL
    SELECT VendorName, 'Outside CA' VendorState
    FROM @Vendors
    WHERE VendorState <> 'CA'
    ORDER BY VendorName;
    

    If you remove the union you get the results from the 2 queries e.g.

    VendorName VendorState
    Test 1 CA
    Test 2 CA
    VendorName VendorState
    Test 3 Outside CA
    Test 4 Outside CA

    Then add back the union and you get see the effect:

    VendorName VendorState
    Test 1 CA
    Test 2 CA
    Test 3 Outside CA
    Test 4 Outside CA