I have bunch of tables where several of them have hundreds of columns. I need to get a count of non-null values for each column and I've been doing it manually. I would like to figure out a way to get all the counts for all the columns in a table. I looked up stackoverflow and google, but unable to find the answer.
I tried this but it's just returning a value of 1 for each column. I know it's just counting the number of column and not the values in each column. Any suggestions?
select count(COLUMN_NAME)
from information_schema.columns
where table_schema = 'schema_name'
and table_name = 'table_name'
group by COLUMN_NAME
COUNT(column_name)
always gives you the count of NON NULL
values.
Create a generic function like this which can take schema name and table name as arguments.
Here I am constructing select statements joined together by UNION ALL
s each returning the value of the column_name and it's count for all columns when executed dynamically.
CREATE OR REPLACE FUNCTION public.get_count( TEXT, TEXT )
RETURNS TABLE(t_column_name TEXT, t_count BIGINT )
LANGUAGE plpgsql
AS $BODY$
DECLARE
p_schema TEXT := $1;
p_tabname TEXT := $2;
v_sql_statement TEXT;
BEGIN
SELECT STRING_AGG( 'SELECT '''
|| column_name
|| ''','
|| ' count('
|| column_name
|| ') FROM '
|| table_schema
|| '.'
|| table_name
,' UNION ALL ' ) INTO v_sql_statement
FROM information_schema.columns
WHERE table_schema = p_schema
AND table_name = p_tabname;
IF v_sql_statement IS NOT NULL THEN
RETURN QUERY EXECUTE v_sql_statement;
END IF;
END
$BODY$;
Execution
knayak=# select c.col, c.count from
public.get_count( 'public', 'employees' ) as c(col,count);
col | count
----------------+-------
employee_id | 107
first_name | 107
last_name | 107
email | 107
phone_number | 107
hire_date | 107
job_id | 107
salary | 107
commission_pct | 35
manager_id | 106
department_id | 106
(11 rows)