postgresqlamazon-quicksighthyperloglog

How to make QuickSight visualize HyperLogLog (HLL) data?


I have a Spark cluster job that can produce HLL data for unique buyers of products across all stores and months. How can we make QuickSight visualize this?


Solution

  • PostgreSQL DB has HLL extension. Data should be saved from Spark to PostgreSQL and then QuickSight should direct query the DB using parameters exposed from Dataset and wired to Analysis controls.

    Example custom query:

    select hll_cardinality(hll_union_agg(hll_id_customers)) as "Distinct clients" 
    from hll_test_dataset3 h where 
      and (case when << $Store >> = ' All' then true when << $Store >> != ' All' then store in(<< $Store >>) end)
      and (case when << $StartOfMonth >> = '2000-01-01' then true when << $StartOfMonth >> != '2000-01-01' then TO_DATE(month || '-01', 'YYYY-MM-DD') >= << $StartOfMonth >> end) 
      and (case when << $EndOfMonth >> = '2100-01-01' then true when << $EndOfMonth >> != '2100-01-01' then TO_DATE(month || '-01', 'YYYY-MM-DD') < << $EndOfMonth >> end) 
    group by 1
    

    Don't forget to check data interability: verify that same query in Spark and PostgreSQL return same result.