sqlverticavsql

Return Top N rows for each group (Vertica/vsql)


Familiar question, but with Vertica. I'd like to return the top 5 geo_country rows based on sum(imps) for each tag_id. This is the query I started:

SELECT tag_id,
       geo_country,
       SUM(imps) AS imps,
       RANK() OVER (PARTITION BY tag_id ORDER BY SUM(imps) DESC) AS rank
FROM table1
WHERE tag_id IN (2013150,1981153)
AND ymd > CURRENT_DATE - 3
GROUP BY 1,
         2 LIMIT 10;

This actually returns only rows from the first tag in the WHERE clause (2013150). I know that the other tag has sum(imps) values high enough which should include it in the results.

Also, how do I implement the Top N part? I tried adding a LIMIT clause within the OVER function, but it doesn't look like it is an accepted parameter.


Solution

  • Solved. The solution is to convert the query to a subquery and then use the WHERE clause to filter by rank:

    SELECT * 
    FROM (SELECT tag_id, geo_country, sum(imps),
        RANK() OVER (PARTITION BY tag_id ORDER BY SUM(imps) DESC) AS rank 
        FROM table1
        WHERE tag_id IN (2013150,1981153)
        AND ymd > CURRENT_DATE - 3
        GROUP BY 1,2) as t2
    WHERE t2.rank <=5;