sqlamazon-web-servicesamazon-athenaprestotrino

How to duplicate same row X number of times based on the value of a column?


I have a dataset looks like this:

Link_number Houband Time Mean_speed Sample_Number
Link 1 8 8:00 52 2
Link 1 8 8:30 55 5
Link 2 9 9:00 20 3
Link 2 9 9:30 40 4

I need to do duplicate each row X number of times, X = Sample_Number. So my result will look like this:

Link_number Houband Time Mean_speed Sample_Number
Link 1 8 8:00 52 2
Link 1 8 8:00 52 2
Link 1 8 8:30 55 5
Link 1 8 8:30 55 5
Link 1 8 8:30 55 5
Link 1 8 8:30 55 5
Link 1 8 8:30 55 5
Link 2 9 9:00 20 3
Link 2 9 9:00 20 3
Link 2 9 9:00 20 3
Link 2 9 9:30 40 4
Link 2 9 9:30 40 4
Link 2 9 9:30 40 4
Link 2 9 9:30 40 4

So then I can calculate the percentile of speed in each hourband for each link.

I have been able to do this in Postgresql using generate_series:

select Link_number, Hourband, percentile_cont(0.85) within group (order by Mean_Speed) as Speed_85th
from table, generate_series(1,Sample_Number)

How do I do the same in Amazon Athena?


Solution

  • You can use unnest + sequence (which is kind of analog of generate_series based on what I see):

    -- sample data
    WITH dataset(Link_number, Houband, Time, Mean_speed, Sample_Number) as (
        values ('Link 1',   8,  '8:00', 52, 2),
            ('Link 1',  8,  '8:30', 55, 5),
            ('Link 2',  9,  '9:00', 20, 3),
            ('Link 2',  9,  '9:30', 40, 4)
    )
    
    -- query
    select Link_number, Houband, Time, Mean_speed, Sample_Number
    from dataset,
         unnest(sequence(1, Sample_Number)) as t(ignored);