sqlsql-serverdistinct-values

Find rows without duplicates in columns


I have a large table [MasterTable] with 31 columns. For each [TrackID], there are 5 rows, all identical except two columns, [ResultFK] and [ParamFK], linked to two other tables.

My end goal is to extract [ResultFK] and [ParamFK] into another linked table, so [TrackID] becomes the primary key.

However, I am finding some records have different data in columns that should be identical.

I want to select ALL the rows of a [TrackID] that have any differences in any column, except [ResultFK] & [ParameterFK].

I'm using SQL Server.

Sample (not all columns listed):

TrackID CountryFK CRLFK CorineLUFK SubTypeFK ProvidedFK GridTypeFK ResultFK ParamFK
FR_Soil 11 1 35 13 155 6 1847 6
FR_Soil 11 1 35 13 155 6 17035 8
FR_Soil 11 1 35 14 155 6 37456 9
FR_Soil 11 1 35 13 170 7 5147 10
FR_Soil 11 1 35 13 170 7 32656 14
IT_Soil 12 3 14 25 143 8 2341 4
IT_Soil 12 3 14 25 143 8 2741 8
IT_Soil 12 3 14 25 143 8 2345 7
IT_Soil 12 3 14 25 143 8 229 3
IT_Soil 12 3 14 25 143 8 231 9

Desired result: I need to pull all the records for that TrackID to pass to my colleague, so they can determine which of the values is correct. For example, SubTypeFK may need to be all 13 or all 14.

I don't necessarily need to list all 31 columns, only those columns with differing values. but I don't know which ones they will be unless they are included in the query.

TrackID CountryFK CRLFK CorineLUFK SubTypeFK ProvidedFK GridTypeFK ResultFK ParamFK
FR_Soil 11 1 35 13 155 6 1847 6
FR_Soil 11 1 35 13 155 6 17035 8
FR_Soil 11 1 35 14 155 6 37456 9
FR_Soil 11 1 35 13 170 7 5147 10
FR_Soil 11 1 35 13 170 7 32656 14

Solution

  • This method gives you the data you need, just not in the format you showed in your question. You could pivot the results back out again but the results here maybe suitable for your needs anyway.

    DECLARE @masterTable TABLE (TrackID varchar(10), CountryFK INT, CRLFK INT, CorineLUFK int, SubTypeFK int, ProvidedFK int, GridTypeFK int, ResultFK int, ParamFK int)
    INSERT INTO @masterTable VALUES
    ('FR_Soil', 11, 1, 35, 13, 155, 6, 1847, 6),
    ('FR_Soil', 11, 1, 35, 13, 155, 6, 17035, 8),
    ('FR_Soil', 11, 1, 35, 14, 155, 6, 37456, 9),
    ('FR_Soil', 11, 1, 35, 13, 170, 7, 5147, 10),
    ('FR_Soil', 11, 1, 35, 13, 170, 7, 32656, 14),
    ('IT_Soil', 12, 3, 14, 25, 143, 8, 2341, 4),
    ('IT_Soil', 12, 3, 14, 25, 143, 8, 2741, 8),
    ('IT_Soil', 12, 3, 14, 25, 143, 8, 2345, 7),
    ('IT_Soil', 12, 3, 14, 25, 143, 8, 229, 3),
    ('IT_Soil', 12, 3, 14, 25, 143, 8, 231, 9)
    
    
    SELECT 
        TrackID, Property, Count(*) as PropertyValueCount
        FROM (
                SELECT DISTINCT TrackID, Property, PropertyValue
                FROM 
                (
                    SELECT DISTINCT TrackID, CountryFK, CRLFK, CorineLUFK, SubTypeFK, ProvidedFK, GridTypeFK 
                        FROM @masterTable
                ) m 
                UNPIVOT(
                    PropertyValue for Property IN (CountryFK, CRLFK, CorineLUFK, SubTypeFK, ProvidedFK, GridTypeFK)
                ) as UnP
        ) n
        GROUP BY TrackID, Property
        HAVING COUNT(*)>1
    

    This gives us the following results.

    TrackID Property PropertyValueCount
    FR_Soil GridTypeFK 2
    FR_Soil ProvidedFK 2
    FR_Soil SubTypeFK 2

    Here's a dbfiddle you can use to test it.

    All we're doing here is, starting in the middle and working outwards...

    Get a distinct set of all columns (except the ResultFK and ParamFK columns) Take this unpivot it Then get a distinct list of TrackID plus the new columns from the unpivot - Property (column name originally) and PropertyValue. Finally, we group by TrackID and property and return only those with a count >1