snowflake-cloud-data-platformdata-profiling

Is it possible in snowflake to write a query that lists the columns that have all null values?


In snowsight within snowflake, you can profile tables and see the % of null values in the UI, but is there an easy way to query for this data or export it from the UI? I just need to create a new table off a table with 1k+, but exclude columns that only have null vaules?

Input:

CREATE OR REPLACE TABLE TAB(Q INT, X INT, Y INT, Z INT)
AS
SELECT NULL, 0041X00002OEkZ5QAL,2, NULL UNION ALL
SELECT NULL, NULL, NULL, NULL UNION ALL
SELECT NULL, NULL, NULL, NULL UNION ALL
SELECT NULL, NULL, 2, NULL;

Solution

  • Using Snowpark to remove all-nullable columns:

    import snowflake.snowpark as snowpark
    from snowflake.snowpark.functions import col, count
    
    def main(session: snowpark.Session): 
        # input
        df = session.table('PUBLIC.TAB')
    
        # columns with all nulls
        dropcols = [c for c in df.columns if df.agg(count(col(c))).collect()[0][0]==0]
    
        # save as new table without nulls columns
        df.drop(dropcols).write.save_as_table('PUBLIC.TAB_WITHOUT_NULL',mode='overwrite')
                
        return 'Success'
    

    For input:

    CREATE OR REPLACE TABLE TAB(Q INT, X INT, Y INT, Z INT)
    AS
    SELECT NULL, 1,2, NULL UNION ALL
    SELECT NULL, NULL, NULL, NULL UNION ALL
    SELECT NULL, 3, NULL, NULL UNION ALL
    SELECT NULL, NULL, 2, NULL;
    
    SELECT * FROM TAB;
    /*
    Q   X   Y   Z
        1   2   
            
        3       
            2   
    */
    

    Output:

    SELECT * FROM TAB_WITHOUT_NULL;
    /*
    X   Y
    1   2
        
    3   
        2
    */
    

    Related: