I have data that looks like this:
serialized_book
:
id | author | title
---+--------+---------------------------
0 | foo | i am foo.
1 | bar | the derivabloviating . . .
...
chapter_pub_date
:
id | chapter | year | month | day | book_id
---+---------+------+-------+-----+---------
0 | 1 | 2024 | 12 | 10 | 0
1 | 2 | 2025 | 1 | 5 | 0
...
27 | 1 | 2024 | 12 | 10 | 1
...
I want a query that yields unique authors-who-published-that-month (I know the data isn't fully normalized since I don't have a separate table for "author").
So something like this:
month | unique_author_count
--------+--------------------
2024-12 | 2
2025-1 | 1
...
The best I've got so far is
select
w.author, concat(p.year, "-", p.month) as yr_mo
from
serialized_book w
join
chapter_pub_date p on w.id = p.book_id
group by
w.author, yr_mo
order by
yr_mo desc
This returns a list of every unique (author, year-month-they-published-something)
:
author | yr_mo
-------+----------
foo | 2024-12
foo | 2025-1
bar | 2024-12
...
But I'm stuck going from this to
yr_mo | count_unique_author_per_yr_mo
--------+-------------------------------
2024-12 | 2
...
I think it's either going to require some kind of partition
or a sub-query, but when I look at examples in the docs and on SO, I can't find any that feature joins, which complicates the sub-query solution.
This is actually simpler than you are making it out to be. You can use the DISTINCT qualifier inside the author count and you won't need a subquery at all.
Select year, month, Count(Distinct author)
From serialized_book w Inner Join chapter_pub_date p
On w.id = p.book_id
Group By year, month
Order by year Desc, month Desc