sqldatabasetime-seriesquestdb

Get totals for top 5 elements in GROUP BY, and then the rest of the groups in a single total


I have a question. I have the following scheme

CREATE TABLE 'redirect3' (
  id INT,
  short_url_id INT,
  browser SYMBOL capacity 256 CACHE,
  platform SYMBOL capacity 256 CACHE,
  os SYMBOL capacity 256 CACHE,
  referrer_domain VARCHAR,
  country SYMBOL capacity 256 CACHE,
  language SYMBOL capacity 256 CACHE,
  time TIMESTAMP
) timestamp (time) PARTITION BY MONTH WAL;

You can generate some pseudo-random data for this table like this:

INSERT INTO redirect3 (
    id, 
    short_url_id, 
    browser, 
    platform, 
    os, 
    referrer_domain, 
    country, 
    language, 
    time
)
SELECT
    x AS id,
    rnd_int(1, 100, 0) AS short_url_id,
    rnd_symbol('Chrome', 'Firefox', 'Safari', 'Edge', 'Opera', 'Internet Explorer', 'Brave') AS browser,
    rnd_symbol('Windows', 'Mac', 'Linux', 'Android', 'iOS') AS platform,
    rnd_symbol('Windows 10', 'macOS', 'Ubuntu', 'Android 11', 'iOS 14') AS os,
    rnd_str('google.com', 'facebook.com', 'twitter.com', 'linkedin.com', 'instagram.com', 'youtube.com', 'reddit.com') AS referrer_domain,
    rnd_symbol('US', 'UK', 'DE', 'FR', 'JP', 'IN', 'BR') AS country,
    rnd_symbol('en', 'es', 'de', 'fr', 'ja', 'hi', 'pt') AS language,
    timestamp_sequence(to_timestamp('2023-01-01', 'yyyy-MM-dd'), 1000000000) AS time -- 1 second intervals
FROM long_sequence(1000, 123456789, 987654321) x;

For example, I would now like to display the top 5 browsers and then others with the remaining values.

I see 2 options for this

I use SELECT count(), browser FROM redirect3 and truncate and sum after the 5th value.

I fetch the top 5 and the total number.

With the pseudo random data above, the results should be

Browser Total
-Others- 256
Firefox 160
Internet Explorer 152
Edge 152
Safari 141
Chrome 139

I would actually like to do the same for other fields like os and country.

Does questdb allow me to do all of this in a query or how would you implement it?


Solution

  • We can do something like this (this query can be executed at the demo instance https://demo.questdb.io)

    with totals AS 
    (
    select 
    symbol
    , count() as total from trades
    ), ranked AS (
    SELECT *, rank() over (order by total DESC) as ranking from totals
    )
    select case when ranking <= 5 THEN 
    symbol
     else '-Others-' end, SUM(total)
    from ranked order by 2 DESC;
    

    For the table in the question, it would be

    with totals AS 
    (
    select 
    browser
    , count() as total from redirect3
    ), ranked AS (
    SELECT *, rank() over (order by total DESC) as ranking from totals
    )
    select case when ranking <= 5 THEN 
    browser
     else '-Others-' end, SUM(total)
    from ranked order by 2 DESC;