I am trying to use a complex nested dataset on Google BigQuery using the tutorial here: https://developers.google.com/web/tools/chrome-user-experience-report/getting-started
The data exported as json looks like this: https://gist.github.com/kmturley/c46eb3898d6ee62871f4402a4d2c6f7d
The example i'm using which works is: https://bigquery.cloud.google.com/savedquery/226352634162:c5b7ee9ea0394728a821cf60f58737c2
#standardSQL
SELECT
effective_connection_type.name AS ect,
SUM((
SELECT
SUM(bin.density)
FROM
UNNEST(first_contentful_paint.histogram.bin) bin
WHERE
bin.end <= 1000
AND origin = 'http://example.com')) AS density
FROM
`chrome-ux-report.chrome_ux_report.201710`
GROUP BY
ect
ORDER BY
density DESC
This outputs the data:
1 4G 0.6977
2 3G 0.056
3 slow-2G null
4 2G null
5 offline null
However I would like to output the bin.end value (time), when SUM(bin.density) reaches 0.7. My expected output would look something like this:
1 4G 1000
2 3G 50000
3 slow-2G null
4 2G null
5 offline null
Meaning on 4G connection, 70% (0.7) of page loads are less than 1.5 seconds (1500). I have tried to modify the script to be:
SELECT
SUM(bin.density)
WHERE
SUM(bin.density) <= 0.7
But this is not allowed, so tried:
SELECT
SUM(bin.density) AS RunningTotal
WHERE
RunningTotal <= 0.7
And also tried
SELECT
SUM(bin.density) OVER() AS RunningTotal
WHERE
RunningTotal <= 0.7
But this doesn't work either! How can I implement a running total with nested datasets? and have it output the bin.end time?
If I can't get nested datasets to work with SQL running totals, then my only other option is to flatten the dataset, and loop through each row with Python to calculate the result. Which is far less performant!
Updated: Solution based on answer by Felipe Hoffa
#standardSQL
SELECT origin, form, ect, `end`, density
FROM (
SELECT origin, form, ect, `end`, density, ROW_NUMBER() OVER(PARTITION BY ect ORDER BY `end` DESC) rn
FROM (
SELECT origin, form, ect, bin.end, SUM(bin.density) OVER(PARTITION BY ect ORDER BY `end`) AS density
FROM (
SELECT origin, form_factor.name form, effective_connection_type.name ect, first_contentful_paint.histogram
FROM `chrome-ux-report.chrome_ux_report.201710`
WHERE origin = 'http://example.com' AND form_factor.name = 'phone'
) , UNNEST(histogram.bin) AS bin
)
WHERE density < 0.7
)
WHERE rn=1
https://bigquery.cloud.google.com/savedquery/88263730615:ba1906e86b074511a804660ec973de37
With a cumulative SUM() and sorting the results by connection:
#standardSQL
SELECT ect, `end`, density
FROM (
SELECT ect, `end`, density, ROW_NUMBER() OVER(PARTITION BY ect ORDER BY `end` DESC) rn
FROM (
SELECT ect, bin.end, SUM(bin.density) OVER(PARTITION BY ect ORDER BY `end`) AS density
FROM (
SELECT effective_connection_type.name ect, first_contentful_paint.histogram
FROM `chrome-ux-report.chrome_ux_report.201710`
WHERE origin = 'http://example.com'
) , UNNEST(histogram.bin) AS bin
)
WHERE density < 0.7
)
WHERE rn=1