I have loaded two fact tables CDI
and Population
and a couple dimension tables in DuckDB. I did joins on the CDI
fact table and its respective dimension tables which yields a snippet of the table below
And below is the Population
fact table merged with its other dimension tables yielding this snippet below
Now what I want to basically do is filter out the Population
table based only on the values of this particular row of the CDI
table. In this case the current row outlined in green will somehow do this query
SELECT Year, SUM(Population) AS TotalPopulation
FROM Population
WHERE (Year BETWEEN 2018 AND 2018) AND
(Age BETWEEN 18 AND 85) AND
State = 'Pennsylvania' AND
Sex IN ('Male', 'Female') AND
Ethnicity IN ('Multiracial') AND
Origin IN ('Not Hispanic')
GROUP BY Year
ORDER BY Year ASC
This query aggregates the Population
column values based on the row values of the CDI
table. What I'm just at a loss in trying to implement is doing this aggregation operation for all row values in the CDI
table. Here is a full visualization of what I'm trying to do.
How would I implement this type of varying filtering aggregation based on each row column values of the CDI
table? I'm using DuckDB as the OLAP DB here so ANSI SQL is what I'm trying to use to implement this task. Could it be possible only using this kind of SQL?
I agree with Chris Maurer comment, here is a SQL query to achieve what you are looking for :
SELECT YearStart, YearEnd, LocationDesc, AgeStart, AgeEnd, Sex, Ethnicity, Origin, Sun(Population) AS TotalPopulation
FROM CDI
LEFT JOIN Population AS pop
ON
(pop.Year BETWEEN CDI.YearStart AND CDI.YearEnd)
AND
(CDI.Sex=pop.Sex OR CDI.Sex='both')
AND
(pop.Age BETWEEN pop.AgeStart AND (CASE WHEN pop.AgeEnd='infinity' THEN 1000 ELSE pop.AgeEnd END))
AND
(CDI.LocationDesc = pop.State)
AND
(CDI.Ethnicity=pop.Ethnicity OR CDI.Ethnicity='All')
AND
(CDI.Origin=pop.Origin OR CDI.Origin='Both')
GROUP BY 1,2,3,4,5,6,7,8
ORDER BY 9 DESC
Hope this helps.