I have a table liked the following:
----------------------------
userId | artistId | trackId
----------------------------
user1 | artist1 | track1
-------|----------|--------
user1 | artist1 | track1
-------|----------|--------
user1 | artist1 | track1
-------|----------|--------
user2 | artist1 | track1
-------|----------|--------
user2 | artist2 | track2
-------|----------|--------
user2 | artist2 | track2
-------|----------|--------
.... | .... | ....
What I need to do is: for each user, select those tracks that he listened more than once. So, e.g., in the table above, for user 1, it should return track1 (all 3 rows) and for user2 it should return only track2 (again all two rows).
I know how to count specific values of a column for each user, e.g. to count number of times each user listened to track1 and track2, I write this:
select userId,
sum(case when trackId = 'track1' then 1 else 0 end) as track1_count,
sum(case when trackId = 'track2' then 1 else 0 end) as track2_count
from populartracks_logs
group by userId
But since my table is big (about 5 million rows) with many tracks, I don't know how to select (FOR EACH USER) those tracks that are repeated for that specific user more than once. What can I try next?
If you want to retrieve all lines, you'll need a subquery, and a join on it (you may add artistId in the inner query, group by and join clause)
select userId, trackId, artistId
from populartracks_logs p
join
(select userId, trackId
from populartracks_logs
group by userId, trackId
having count(*) > 1) as m
on m.userId = p.userId and p.trackID = m.trackId