I have a large table [MasterTable]
with 31 columns. For each [TrackID]
, there are 5 rows, all identical except two columns, [ResultFK]
and [ParamFK]
, linked to two other tables.
My end goal is to extract [ResultFK]
and [ParamFK]
into another linked table, so [TrackID]
becomes the primary key.
However, I am finding some records have different data in columns that should be identical.
I want to select ALL the rows of a [TrackID]
that have any differences in any column, except [ResultFK]
& [ParameterFK]
.
I'm using SQL Server.
Sample (not all columns listed):
TrackID | CountryFK | CRLFK | CorineLUFK | SubTypeFK | ProvidedFK | GridTypeFK | ResultFK | ParamFK |
---|---|---|---|---|---|---|---|---|
FR_Soil | 11 | 1 | 35 | 13 | 155 | 6 | 1847 | 6 |
FR_Soil | 11 | 1 | 35 | 13 | 155 | 6 | 17035 | 8 |
FR_Soil | 11 | 1 | 35 | 14 | 155 | 6 | 37456 | 9 |
FR_Soil | 11 | 1 | 35 | 13 | 170 | 7 | 5147 | 10 |
FR_Soil | 11 | 1 | 35 | 13 | 170 | 7 | 32656 | 14 |
IT_Soil | 12 | 3 | 14 | 25 | 143 | 8 | 2341 | 4 |
IT_Soil | 12 | 3 | 14 | 25 | 143 | 8 | 2741 | 8 |
IT_Soil | 12 | 3 | 14 | 25 | 143 | 8 | 2345 | 7 |
IT_Soil | 12 | 3 | 14 | 25 | 143 | 8 | 229 | 3 |
IT_Soil | 12 | 3 | 14 | 25 | 143 | 8 | 231 | 9 |
Desired result: I need to pull all the records for that TrackID
to pass to my colleague, so they can determine which of the values is correct. For example, SubTypeFK
may need to be all 13 or all 14.
I don't necessarily need to list all 31 columns, only those columns with differing values. but I don't know which ones they will be unless they are included in the query.
TrackID | CountryFK | CRLFK | CorineLUFK | SubTypeFK | ProvidedFK | GridTypeFK | ResultFK | ParamFK |
---|---|---|---|---|---|---|---|---|
FR_Soil | 11 | 1 | 35 | 13 | 155 | 6 | 1847 | 6 |
FR_Soil | 11 | 1 | 35 | 13 | 155 | 6 | 17035 | 8 |
FR_Soil | 11 | 1 | 35 | 14 | 155 | 6 | 37456 | 9 |
FR_Soil | 11 | 1 | 35 | 13 | 170 | 7 | 5147 | 10 |
FR_Soil | 11 | 1 | 35 | 13 | 170 | 7 | 32656 | 14 |
This method gives you the data you need, just not in the format you showed in your question. You could pivot the results back out again but the results here maybe suitable for your needs anyway.
DECLARE @masterTable TABLE (TrackID varchar(10), CountryFK INT, CRLFK INT, CorineLUFK int, SubTypeFK int, ProvidedFK int, GridTypeFK int, ResultFK int, ParamFK int)
INSERT INTO @masterTable VALUES
('FR_Soil', 11, 1, 35, 13, 155, 6, 1847, 6),
('FR_Soil', 11, 1, 35, 13, 155, 6, 17035, 8),
('FR_Soil', 11, 1, 35, 14, 155, 6, 37456, 9),
('FR_Soil', 11, 1, 35, 13, 170, 7, 5147, 10),
('FR_Soil', 11, 1, 35, 13, 170, 7, 32656, 14),
('IT_Soil', 12, 3, 14, 25, 143, 8, 2341, 4),
('IT_Soil', 12, 3, 14, 25, 143, 8, 2741, 8),
('IT_Soil', 12, 3, 14, 25, 143, 8, 2345, 7),
('IT_Soil', 12, 3, 14, 25, 143, 8, 229, 3),
('IT_Soil', 12, 3, 14, 25, 143, 8, 231, 9)
SELECT
TrackID, Property, Count(*) as PropertyValueCount
FROM (
SELECT DISTINCT TrackID, Property, PropertyValue
FROM
(
SELECT DISTINCT TrackID, CountryFK, CRLFK, CorineLUFK, SubTypeFK, ProvidedFK, GridTypeFK
FROM @masterTable
) m
UNPIVOT(
PropertyValue for Property IN (CountryFK, CRLFK, CorineLUFK, SubTypeFK, ProvidedFK, GridTypeFK)
) as UnP
) n
GROUP BY TrackID, Property
HAVING COUNT(*)>1
This gives us the following results.
TrackID | Property | PropertyValueCount |
---|---|---|
FR_Soil | GridTypeFK | 2 |
FR_Soil | ProvidedFK | 2 |
FR_Soil | SubTypeFK | 2 |
Here's a dbfiddle you can use to test it.
All we're doing here is, starting in the middle and working outwards...
Get a distinct set of all columns (except the ResultFK and ParamFK columns) Take this unpivot it Then get a distinct list of TrackID plus the new columns from the unpivot - Property (column name originally) and PropertyValue. Finally, we group by TrackID and property and return only those with a count >1