pythonsqldata-scienceduckdbdata-engineering

How do I do a specific aggregation on a table based on row column values on another table (SQL)?


I have loaded two fact tables CDI and Population and a couple dimension tables in DuckDB. I did joins on the CDI fact table and its respective dimension tables which yields a snippet of the table below CDI table merged

And below is the Population fact table merged with its other dimension tables yielding this snippet below Population table merged enter image description here

Now what I want to basically do is filter out the Population table based only on the values of this particular row of the CDI table. In this case the current row outlined in green will somehow do this query

SELECT Year, SUM(Population) AS TotalPopulation
    FROM Population
    WHERE (Year BETWEEN 2018 AND 2018) AND
    (Age BETWEEN 18 AND 85) AND
    State = 'Pennsylvania' AND
    Sex IN ('Male', 'Female') AND
    Ethnicity IN ('Multiracial') AND
    Origin IN ('Not Hispanic')
    GROUP BY Year
    ORDER BY Year ASC

This query aggregates the Population column values based on the row values of the CDI table. What I'm just at a loss in trying to implement is doing this aggregation operation for all row values in the CDI table. Here is a full visualization of what I'm trying to do. enter image description here

How would I implement this type of varying filtering aggregation based on each row column values of the CDI table? I'm using DuckDB as the OLAP DB here so ANSI SQL is what I'm trying to use to implement this task. Could it be possible only using this kind of SQL?


Solution

  • I agree with Chris Maurer comment, here is a SQL query to achieve what you are looking for :

    SELECT YearStart, YearEnd, LocationDesc, AgeStart, AgeEnd, Sex, Ethnicity, Origin, Sun(Population) AS TotalPopulation
    FROM CDI 
    LEFT JOIN Population AS pop
    ON
    (pop.Year BETWEEN CDI.YearStart AND CDI.YearEnd)
    AND 
    (CDI.Sex=pop.Sex OR CDI.Sex='both') 
    AND 
    (pop.Age BETWEEN pop.AgeStart AND (CASE WHEN pop.AgeEnd='infinity' THEN 1000 ELSE pop.AgeEnd END))
    AND
    (CDI.LocationDesc = pop.State)
    AND
    (CDI.Ethnicity=pop.Ethnicity OR CDI.Ethnicity='All')
    AND
    (CDI.Origin=pop.Origin OR CDI.Origin='Both')
    GROUP BY 1,2,3,4,5,6,7,8
    ORDER BY 9 DESC
    

    Hope this helps.