Using SQL Let’s say we have a dataset of population by state (e.g., Vermont, 623,251, and so on), but we want to know the population by United States region (e.g., Midwest, 68,985,454). Could you describe how you would go about doing that?
--First I created a table with a state and population column.
CREATE TABLE states (
state VARCHAR(20),
population INT
);
--Then I uploaded a CSV file from census.gov that I cleaned up.
SELECT * FROM states;
--Created a temporary table to add in the region column.
DROP TABLE IF EXISTS temp_Regions;
CREATE TEMP TABLE temp_Regions (
state VARCHAR(20),
state_pop INT,
region VARCHAR(20),
region_pop INT
);
INSERT INTO temp_Regions
SELECT state, population
FROM states;
--Used CASE WHEN statements to put states in to their respective regions.
SELECT state,
CASE WHEN state IN ('Connecticut', 'Maine', 'Massachusetts', 'New Hampshire', 'Rhode Island', 'Vermont', 'New Jersey', 'New York', 'Pennsylvania') THEN 'Northeast'
WHEN state IN ('Illinois', 'Indiana', 'Michigan', 'Ohio', 'Wisconsin', 'Iowa', 'Kansas', 'Minnesota', 'Missouri', 'Nebraska', 'North Dakota', 'South Dakota') THEN 'Midwest'
WHEN state IN ('Delaware', 'District of Columbia', 'Florida', 'Georgia', 'Maryland', 'North Carolina', 'South Carolina', 'Virginia', 'West Virginia', 'Alabama', 'Kentucky', 'Mississippi', 'Tennessee', 'Arkansas', 'Louisiana', 'Oklahoma', 'Texas') THEN 'South'
WHEN state IN ('Arizona', 'Colorado', 'Idaho', 'Montana', 'Nevada', 'New Mexico', 'Utah', 'Wyoming', 'Alaska', 'California', 'Hawaii', 'Oregon', 'Washington') THEN 'West'
END AS region, state_pop, region_pop
FROM temp_Regions;
--Now I'm stuck at this point. I'm unable to get data into the region_pop column. How do I get the sum of the populations by U.S. Region?
Let me know if you need further clarification on things. Thanks for your help y'all!
You can make use of analytical function sum() over(partition by) to achieve this
with data
as (
SELECT state
,CASE WHEN state IN ('Connecticut', 'Maine', 'Massachusetts', 'New Hampshire', 'Rhode Island', 'Vermont', 'New Jersey', 'New York', 'Pennsylvania') THEN 'Northeast'
WHEN state IN ('Illinois', 'Indiana', 'Michigan', 'Ohio', 'Wisconsin', 'Iowa', 'Kansas', 'Minnesota', 'Missouri', 'Nebraska', 'North Dakota', 'South Dakota') THEN 'Midwest'
WHEN state IN ('Delaware', 'District of Columbia', 'Florida', 'Georgia', 'Maryland', 'North Carolina', 'South Carolina', 'Virginia', 'West Virginia', 'Alabama', 'Kentucky', 'Mississippi', 'Tennessee', 'Arkansas', 'Louisiana', 'Oklahoma', 'Texas') THEN 'South'
WHEN state IN ('Arizona', 'Colorado', 'Idaho', 'Montana', 'Nevada', 'New Mexico', 'Utah', 'Wyoming', 'Alaska', 'California', 'Hawaii', 'Oregon', 'Washington') THEN 'West'
END AS region
, state_pop
FROM temp_Regions
)
select state
,region
,state_pop
,sum(state_pop) over(partition by region) as region_population
from data