I run into this situation from time to time, where I want to take a list of things and see if any of them exist in another list.
If you have a single number, it's simple enough to use a where clause to say something like:
where topic_id IN (select topic_id from articles where article_id = 101)
But what if, say, I have a user who's selected a list of categories they're interested in (1,2,7,25,66), and I want to pull the most recent 15 articles that fall under any of those topics, and a given article can belong to 1 or more topics? It doesn't work to say:
where (select topic_id from articles) IN (1,2,7,25,66)
order by topic_date desc
It's possible to loop over the right list in a scripting language, doing a separate query for 1,2,7,25, and 66, but that could get very inefficient if it were a longer list. Is there a relatively efficient way to do something like this in a single SQL query?
sample data:
articles
article_id Title Text
1 Today's IC flavor Today's IC will be strawberry.
2 Looking Sunny Today's going to be hot!
3 Good day to fly No flights have been delayed today, because the weather is great!
4 Game on! There will be an epic baseball game today
5 Hockey too? There will be a hockey game today, indoors.
6 Special Aftergame Event In Tokyo, there will be an ice cream party after the baseball game. Woot!
topics
topic_id topic_name
1 Food
2 Travel
3 Weather
4 sports
link_article_topic
article_id topic_id
1 1
2 3
3 2
3 3
4 4
5 4
6 1
6 2
6 4
Notice that some articles have records for more than 1 topic in link_article_topic. This is simplified - my real world case has a much more extensive list of topics and articles, and my real world user has a much longer list of preferred topics, but it's the same concept.
So say my user has preset their desired topics to (2,4), or Travel and Sports. Say also that the articles table has a date field for when the article was published. I want to get the 3 most recent articles, in descending date order, that belong to the Travel and Sports topics.
In my real-world case, I tried a query using EXISTS similar to below, but it timed out. Please correct me if I have the syntax wrong, but I believe the "a." is necessary for the WHERE in that EXISTS statement:
SELECT a.article_id, a.Title, a.Text
FROM Articles a
WHERE a.extraColumn1 IN (list)
AND a.extraColumn2 IN (anotherList)
AND EXISTS (SELECT article_id from link_article_topic
WHERE article_id = a.article_ID
AND topic_ID IN (2,4))
ORDER BY DATE DESC
Since you're using a mapping table to associate topics with articles (an excellent choice) you can use an INNER JOIN and a DISTINCT phase to remove duplicates
EDIT NOTE: This has been re-written to target MySQL at OP's request, here is a link to an online MySQL interpreter with the below code snippet https://extendsclass.com/mysql/825408f
/*BEGIN sample data*/
with cteTopics as (
SELECT *
FROM (VALUES ROW(1, 'Art'), ROW(2, 'Music'), ROW(3, 'Poetry')
, ROW(4, 'Literature'), ROW(5, 'Math'), ROW(6, 'Physics')
, ROW(7, 'Photography'), ROW(8, 'News'), ROW(9, 'Crafts')
, ROW(10, 'Sports'), ROW(11, 'History'), ROW(12, 'Tech')
) as tpc(TopicID, TopicName)
), cteArticles as (
SELECT *
FROM (VALUES ROW(1, 'A brief history of golf', '2020-01-14')
, ROW(2, 'Stop action pickleball photos', '2023-06-10')
, ROW(3, 'The physics of artistic photographs', '2022-07-15')
, ROW(4, 'Songs of War and Peace', '2022-09-21')
, ROW(5, 'Writing tech reviews', '2020-05-05')
) as art(ArticleID, ArticleTitle, PublishDate)
), cteArticleTags as (
SELECT *
FROM (VALUES ROW(1, 11), ROW(1, 10)
, ROW(2, 7), ROW(2, 10)
, ROW(3, 6), ROW(3, 1), ROW(3, 7)
, ROW(4, 2), ROW(4, 4)
, ROW(5, 12), ROW(5, 7), ROW(5, 4)
) as attr(ArticleID, TopicID)
), cteUserInterests as (
SELECT *
FROM (VALUES ROW(1, 2), ROW(1, 7), ROW(1, 10)
, ROW(2, 5), ROW(2, 10)
, ROW(3, 5), ROW(3, 6), ROW(3, 12)
) as attr(UserID, TopicID)
)/*END of sample data
Now build solution, look for the set of articles that match for each user*/
, cteMatch as (
SELECT DISTINCT U.UserID, A.ArticleID
FROM cteUserInterests as U
INNER JOIN cteArticleTags as T on U.TopicID = T.TopicID
INNER JOIN cteArticles as A on A.ArticleID = T.ArticleID
)
SELECT M.UserID, A.*
FROM cteMatch as M INNER JOIN cteArticles as A on A.ArticleID = M.ArticleID
ORDER BY M.UserID, A.PublishDate DESC
BTW, the output of this is as follows
UserID | ArticleID | ArticleTitle | PublishDate |
---|---|---|---|
1 | 2 | Stop action pickleball photos | 2023-06-10 |
1 | 4 | Songs of War and Peace | 2022-09-21 |
1 | 3 | The physics of artistic photographs | 2022-07-15 |
1 | 5 | Writing tech reviews | 2020-05-05 |
1 | 1 | A brief history of golf | 2020-01-14 |
2 | 2 | Stop action pickleball photos | 2023-06-10 |
2 | 1 | A brief history of golf | 2020-01-14 |
3 | 3 | The physics of artistic photographs | 2022-07-15 |
3 | 5 | Writing tech reviews | 2020-05-05 |