sqlsql-servert-sql

Join a date table to a data table to get counts per year/month


Assume a table filled with dates from 2010 to 2050.

Assume a data table (called Lease) with some data, and includes a date column, Move_Out_Date.

The goal is to count the records with Move Out Dates in each year/month.

A simple Group By gets the correct answer:

SELECT
    YEAR(move_out_date) MOYear,
    MONTH(move_out_date) MOMonth,
    COUNT(move_out_date) AS Count
FROM 
    lease l
WHERE
    YEAR(move_out_date) = '2019'
GROUP BY 
    YEAR(move_out_date),
    MONTH(move_out_date)
ORDER BY
    YEAR(move_out_date),
    MONTH(move_out_date)

This query produces this output:

YEAR MONTH COUNT
2019 9 2
2019 10 4
2019 12 3

What I am looking to get is this:

YEAR MONTH COUNT
2019 1 0
2019 2 0
2019 3 0
2019 4 0
2019 5 0
2019 6 0
2019 7 0
2019 8 0
2019 9 2
2019 10 4
2019 11 0
2019 12 3

I've been messing with cross joins and left outer joins to the date table but always end up with huge numbers for the counts.


Solution

  • The trick is to start from dates table first. Use this to create a projection of your desired years/months. Then JOIN from the projection to the lease table. Since the dates table is presumed complete, we can use an INNER JOIN, and don't need a LEFT or CROSS, though a LEFT would work.

    It's also good to think carefully about the conditional expression used for the JOIN. We want to avoid mutating the data in the Lease table, as this would need to happen on every row... even rows outside the range. How else would the database know if the result matches? It will also prevent index use, likely causing a full table scan when a more efficient operation would have been available.

    So we get this:

    WITH Months AS (
        -- create the projection
        select DATETRUNC(month, d.[Date]) MonthStart, dateadd(month, 1, datetrunc(month, d.[Date])) MonthEnd
        from [DatesTable] d
        where d.[Date] >= '20190101' and d.[Date] < '20200101'
        group by datetrunc(month, d.[Date])
    )
    SELECT YEAR(m.MonthStart) [Year], MONTH(m.MonthStart) [Month], COUNT(l.*) [Count]
    FROM Months m -- start from the projection
    -- and then add the Lease data
    INNER JOIN Lease l ON l.move_out_date >= m.MonthStart AND l.move_out_date < m.MonthEnd
    GROUP BY m.MonthStart