The BigQuery earthquake
public dataset has 47 columns most of each have missing values. I need an output showing a summary with column_name
, total_entries
, non_missing_entries
& percentage_missing
as the columns of that table.
Currently I am using the query shown here, repeating the blocks for all the 47 columns:
SELECT
'id' AS column_name,
COUNT(id) AS non_missing_entries,
(COUNT(*) - COUNT(id)) * 100.0 / COUNT(*) AS percentage_missing
FROM
`youtube-factcheck.earthquake_analysis.earthquakes_copy`
UNION ALL
SELECT
'flag_tsunami' AS column_name,
COUNT(flag_tsunami) AS non_missing_entries,
(COUNT(*) - COUNT(flag_tsunami)) * 100.0 / COUNT(*) AS percentage_missing
FROM
`youtube-factcheck.earthquake_analysis.earthquakes_copy`
UNION ALL
-- Repeat the above block for other columns
-- ...
Output:
| column_name|non_missing_entries | percentage_missing|
| -----------| -------------------| ------------------|
|flag_tsunami| 1869| 70.20564323290291|
| id| 6273| 0|
| ...| ...| ...|
Is there a SQL to avoid the long tedious work of writing 47 blocks of the same query?
UNPIVOT
is your friend. (note I had to change the source as I don't have access to bigquery-public-data.noaa_significant_earthquakes.earthquakes
)
with cte as (
select column_name,
count(*) as non_missing_entries
from (
select cast(id as string) as id,
flag_tsunami,
cast(year as string) as year,
cast(month as string) as month,
cast(day as string) as day,
cast(hour as string) as hour,
cast(minute as string) as minute,
cast(second as string) as second
from `bigquery-public-data.noaa_significant_earthquakes.earthquakes`)
unpivot ( value for column_name in (id, flag_tsunami,year,month,day,hour,minute,second))
group by column_name
),
id_only as (
select column_name,non_missing_entries
from cte
where column_name = 'id'
)
select cte.column_name,
cte.non_missing_entries,
(id_only.non_missing_entries - cte.non_missing_entries) * 100.0 / id_only.non_missing_entries as percentage_missing
from cte
cross join id_only;
it returns this:
You'll have to:
but I think that's better than UNIONing 47 times.