sql-serverssisetldata-profiling

How to loop through all tables and fields in each table to get percentage of missing values


I am trying to, using SSIS, obtain a table to get the percentage of missing values of every field in every table of a SQL Server database.

Ideally I would like to create a new table in another database with 4 fields

Table / Field / Type / PctMissingValues

with one row for each field of a table. Type would be the field type

My idea was to use foreach loop containers to loop through tables and fields, with inside the container a Data flow task consisting of OLE DB Source > Aggregate > OLE DB Destination but I can't figure out how to do this, I am new to SSIS and ETLs in general


Solution

  • The following SQL query generates one query per column in a database that counts total rows and rows where the value is NULL.

    You can load this in to a variable and loop through it in SSIS running the statement in each row one at a time and logging the results form that query out to another table.

    SELECT 
     OBJECT_SCHEMA_NAME(C.object_id) AS TableSchema
    ,OBJECT_NAME(C.object_id) AS TableName
    ,C.name AS ColumnName
    ,'SELECT COUNT(*) AS TotalRows, COUNT(IIF([' +C.name+ '] IS NULL,1,NULL)) AS NullRows 
        FROM [' + OBJECT_SCHEMA_NAME(C.object_id) + '].[' + OBJECT_NAME(C.object_id) + ']' AS CountQuery
    FROM sys.columns AS C
    INNER JOIN sys.tables AS T
        ON C.object_id = T.object_id