chartssnowflake-cloud-data-platform

Create a curve of data based off an input


i'm trying to visualise a series of data on a chart, ideally creating something that would show how the distribution lies. The data is in a snowflake db, and I've attempted to do this using percentiles. I've calculated the value at 10 % increments. My current query is as below, but this doesn't seem to give me a great output to visualise

select
APPROX_PERCENTILE(field_name, 0.1) "10_percent_value", 
APPROX_PERCENTILE(field_name, 0.2) "20_percent_value", 
APPROX_PERCENTILE(field_name, 0.3) "30_percent_value", 
etc.
from table.name

I'm sure i'm going about this the wrong way, but would like some help on alternative methods that could work please to be able to support a curve type visualisation. Hopefully this is enough information?


Solution

  • Too bad Snowflake's percentile function requires 2nd argument to be a literal constant; I was hoping for a way to pass a column name to it to make this solution a bit more programmatic.

    In terms of a getting a visualizable output, you need all percentile values to be in the same column. Here is what that would look like

    create or replace temporary table percentiles (percentile_value int) as
    
    select approx_percentile(field_name,0.1) from t 
    union all
    select approx_percentile(field_name,0.2) from t 
    union all
    select approx_percentile(field_name,0.3) from t 
    union all
    select approx_percentile(field_name,0.4) from t 
    union all
    select approx_percentile(field_name,0.5) from t 
    union all
    select approx_percentile(field_name,0.6) from t 
    union all
    select approx_percentile(field_name,0.7) from t 
    union all
    select approx_percentile(field_name,0.8) from t 
    union all
    select approx_percentile(field_name,0.9) from t 
    union all
    select approx_percentile(field_name,1.0) from t;
    
    select *, row_number() over (order by percentile_value)||'0th' as percentile 
    from percentiles;
    

    If performance becomes an issue, you can use approx_percentile_accumulate() to store percentile state in a table and run calculations on that instead of the base table

    Sample Output:

    enter image description here enter image description here