I have a dataset looks like this:
Link_number | Houband | Time | Mean_speed | Sample_Number |
---|---|---|---|---|
Link 1 | 8 | 8:00 | 52 | 2 |
Link 1 | 8 | 8:30 | 55 | 5 |
Link 2 | 9 | 9:00 | 20 | 3 |
Link 2 | 9 | 9:30 | 40 | 4 |
I need to do duplicate each row X number of times, X = Sample_Number. So my result will look like this:
Link_number | Houband | Time | Mean_speed | Sample_Number |
---|---|---|---|---|
Link 1 | 8 | 8:00 | 52 | 2 |
Link 1 | 8 | 8:00 | 52 | 2 |
Link 1 | 8 | 8:30 | 55 | 5 |
Link 1 | 8 | 8:30 | 55 | 5 |
Link 1 | 8 | 8:30 | 55 | 5 |
Link 1 | 8 | 8:30 | 55 | 5 |
Link 1 | 8 | 8:30 | 55 | 5 |
Link 2 | 9 | 9:00 | 20 | 3 |
Link 2 | 9 | 9:00 | 20 | 3 |
Link 2 | 9 | 9:00 | 20 | 3 |
Link 2 | 9 | 9:30 | 40 | 4 |
Link 2 | 9 | 9:30 | 40 | 4 |
Link 2 | 9 | 9:30 | 40 | 4 |
Link 2 | 9 | 9:30 | 40 | 4 |
So then I can calculate the percentile of speed in each hourband for each link.
I have been able to do this in Postgresql using generate_series:
select Link_number, Hourband, percentile_cont(0.85) within group (order by Mean_Speed) as Speed_85th
from table, generate_series(1,Sample_Number)
How do I do the same in Amazon Athena?
You can use unnest
+ sequence
(which is kind of analog of generate_series
based on what I see):
-- sample data
WITH dataset(Link_number, Houband, Time, Mean_speed, Sample_Number) as (
values ('Link 1', 8, '8:00', 52, 2),
('Link 1', 8, '8:30', 55, 5),
('Link 2', 9, '9:00', 20, 3),
('Link 2', 9, '9:30', 40, 4)
)
-- query
select Link_number, Houband, Time, Mean_speed, Sample_Number
from dataset,
unnest(sequence(1, Sample_Number)) as t(ignored);