How to match a set of sets against a single set and bind that to the corresponding rows?
Given a row that has a linked summary table with key/value pairs describing properties of that row, and a bunch of search-descriptions (target) that describes how to sum up stuff from that row, how can I find which search-descriptions match a given row, based on matching the properties-table against the key/value pairs in the search-description?
CREATE TABLE TargetKeyValue(TargetId INT, TargetKey NVARCHAR(50), TargetValue NVARCHAR(50))
CREATE TABLE OriginalRows(Id INT, Cost DECIMAL, BunchOfOtherCols NVARCHAR(500),
CONSTRAINT [PK_Id] PRIMARY KEY CLUSTERED ([Id] ASC))
CREATE TABLE RowKeyValue(RowId INT, KeyPart NVARCHAR(50), ValuePart NVARCHAR(50),
CONSTRAINT [FK_RowId_Id] FOREIGN KEY (RowId) REFERENCES OriginalRows(Id))
INSERT INTO OriginalRows VALUES
(1, 55.5, 'Some cool red coat'),
(2, 80.0, 'Some cool green coat XL'),
(3, 250.00, 'Some cool green coat L'),
(4, 100.0, 'Some whiskey'),
(5, 42.0, 'This is not a match')
INSERT INTO RowKeyValue VALUES
(1, 'Color', 'Red'),
(1, 'Size', 'XL'),
(1, 'Kind', 'Coat'),
(2, 'Color', 'Green'),
(2, 'Size', 'XL'),
(2, 'Kind', 'Coat'),
(3, 'Color', 'Green'),
(3, 'Size', 'L'),
(3, 'Kind', 'Coat'),
(4, 'Color', 'Green'),
(4, 'Size', 'Medium'),
(4, 'Kind', 'Whiskey')
INSERT INTO TargetKeyValue VALUES
(55, 'Color', 'Red'),
(56, 'Color', 'Green'),
(56, 'Size', 'XL'),
(57, 'Kind', 'Coat'),
(58, 'Color', 'Green'),
(58, 'Size', 'Medium'),
(58, 'Kind', 'Whiskey')
This gives the following tables:
-- table OriginalRows
Id Cost BunchOfOtherCols
1 56 Some cool red coat
2 80 Some cool green coat XL
3 250 Some cool green coat L
4 100 Some whiskey
5 42 This is not a match
-- table RowKeyValue
RowId KeyPart ValuePart
1 Color Red
1 Size XL
1 Kind Coat
2 Color Green
2 Size XL
2 Kind Coat
3 Color Green
3 Size L
3 Kind Coat
4 Color Green
4 Size Medium
4 Kind Whiskey
-- table TargetKeyValue
TargetId TargetKey TargetValue
55 Color Red
56 Color Green
56 Size XL
57 Kind Coat
58 Color Green
58 Size Medium
58 Kind Whiskey
The function below will give this correct outcome:
Id Cost BunchOfOtherCols IsTargetMatch TargetKeyId
1 56 Some cool red coat 1 55
2 80 Some cool green coat XL 1 56
3 250 Some cool green coat L 1 57
4 100 Some whiskey 1 58
5 42 This is not a match 0 NULL
In other words:
The code below uses cursors, but this proves slow (understandably so since it's basically just a non-indexed table scan over and over).
Another approach I tried was using XML PATH queries, but that turned out to be a non-starter (it was easy, but too slow as well).
I'm aware this is a non-trivial task in relation databases, but I hope there's still a rather straightforward solution. What I have below kinda works, and I might just use a batch process to store the results or something, unless there's a better way to do this using SET
operations or, idunno, FULL JOIN
?
Any solution that can be used in a view (i.e., not involving dynamic SQL or calling SP's) is fine. We used to have an SP-based solution but since data needs to be analyzed in PowerBI and other systems, SQL Views and determinism are the way to go.
Here's a fully working minimal example of what I'm after. The function is the part that I'm looking at to replace with a less procedural and more functional, i.e. set-based approach:
CREATE TABLE TargetKeyValue(TargetId INT, TargetKey NVARCHAR(50), TargetValue NVARCHAR(50))
CREATE TABLE OriginalRows(Id INT, Cost DECIMAL, BunchOfOtherCols NVARCHAR(500),
CONSTRAINT [PK_Id] PRIMARY KEY CLUSTERED ([Id] ASC))
CREATE TABLE RowKeyValue(RowId INT, KeyPart NVARCHAR(50), ValuePart NVARCHAR(50),
CONSTRAINT [FK_RowId_Id] FOREIGN KEY (RowId) REFERENCES OriginalRows(Id))
INSERT INTO OriginalRows VALUES
(1, 55.5, 'Some cool red coat'),
(2, 80.0, 'Some cool green coat XL'),
(3, 250.00, 'Some cool green coat L'),
(4, 100.0, 'Some whiskey'),
(5, 42.0, 'This is not a match')
INSERT INTO RowKeyValue VALUES
(1, 'Color', 'Red'),
(1, 'Size', 'XL'),
(1, 'Kind', 'Coat'),
(2, 'Color', 'Green'),
(2, 'Size', 'XL'),
(2, 'Kind', 'Coat'),
(3, 'Color', 'Green'),
(3, 'Size', 'L'),
(3, 'Kind', 'Coat'),
(4, 'Color', 'Green'),
(4, 'Size', 'Medium'),
(4, 'Kind', 'Whiskey')
INSERT INTO TargetKeyValue VALUES
(55, 'Color', 'Red'),
(56, 'Color', 'Green'),
(56, 'Size', 'XL'),
(57, 'Kind', 'Coat'),
(58, 'Color', 'Green'),
(58, 'Size', 'Medium'),
(58, 'Kind', 'Whiskey')
GO
CREATE FUNCTION dbo.MatchTargetAgainstKeysFromRow
(
@rowid INT
)
RETURNS @MatchResults TABLE(
IsTargetMatch BIT,
TargetKeyId INT)
AS
BEGIN
--
-- METHOD (1) (faster, by materializing the xml field into a cross-over lookup table)
--
-- single row from activities as key/value pairs multi-row
DECLARE @rowAsKeyValue AS TABLE(KeyPart NVARCHAR(1000), ValuePart NVARCHAR(MAX))
INSERT INTO @rowAsKeyValue (KeyPart, ValuePart)
SELECT KeyPart, ValuePart FROM RowKeyValue WHERE RowId = @rowid
DECLARE @LookupColumn NVARCHAR(100)
DECLARE @LookupValue NVARCHAR(max)
DECLARE @TargetId INT
DECLARE @CurrentTargetId INT
DECLARE @IsMatch INT
DECLARE key_Cursor CURSOR
LOCAL STATIC FORWARD_ONLY READ_ONLY
FOR SELECT TargetKey, TargetValue, TargetId FROM TargetKeyValue ORDER BY TargetId
OPEN key_Cursor
FETCH NEXT FROM key_Cursor INTO @LookupColumn, @LookupValue, @TargetId
WHILE @@FETCH_STATUS = 0
BEGIN
SET @IsMatch = (SELECT COUNT(*) FROM @rowAsKeyValue WHERE KeyPart = @LookupColumn AND ValuePart = @LookupValue)
IF(@IsMatch = 0)
BEGIN
-- move to next key that isn't the current key
SET @CurrentTargetId = @TargetId
WHILE @@FETCH_STATUS = 0 AND @CurrentTargetId = @TargetId
BEGIN
FETCH NEXT FROM key_Cursor INTO @LookupColumn, @LookupValue, @TargetId
END
END
ELSE
BEGIN
SET @CurrentTargetId = @TargetId
WHILE @@FETCH_STATUS = 0 AND @IsMatch > 0 AND @CurrentTargetId = @TargetId
BEGIN
FETCH NEXT FROM key_Cursor INTO @LookupColumn, @LookupValue, @TargetId
IF @CurrentTargetId = @TargetId
SET @IsMatch = (SELECT COUNT(*) FROM @rowAsKeyValue WHERE KeyPart = @LookupColumn AND ValuePart = @LookupValue)
END
IF @IsMatch > 0
BEGIN
-- we found a positive matching key, nothing more to do
BREAK
END
END
END
DEALLOCATE key_Cursor -- deallocating a cursor also closes it
INSERT @MatchResults
SELECT
(CASE WHEN (SELECT COUNT(*) FROM @rowAsKeyValue) > 0 THEN 1 ELSE 0 END),
(CASE WHEN @IsMatch > 0 THEN @CurrentTargetId ELSE NULL END)
RETURN
END
GO
-- function in action
select * from OriginalRows
cross apply dbo.MatchTargetAgainstKeysFromRow(Id) fn
-- cleanup
drop function dbo.MatchTargetAgainstKeysFromRow
drop table TargetKeyValue
drop table RowKeyValue
drop table OriginalRows
This question is a case of Relational Division With Remainder, with multiple dividends and divisors.
Relational division is basically the opposite of a join: in this case, we want to know which OriginalRows
match which TargetIds
, based on every key/value pair for the TargetId
matching the key/value pairs of the OriginalRows
.
There are many ways to do this, here are a few:
SELECT
r.Id,
r.Cost,
r.BunchOfOtherCols,
t.TargetId
FROM OriginalRows r
OUTER APPLY (
SELECT ttKV.TargetId
FROM TargetKeyValue tKV
LEFT JOIN RowKeyValue rKV
ON rKV.KeyPart = tKV.TargetKey AND rKV.ValuePart = tKV.TargetValue
AND rKV.RowId = r.Id
GROUP BY tKV.TargetId
HAVING COUNT(*) = COUNT(rKV.RowId) -- all target k/vs have match
) t;
SELECT
r.Id,
r.Cost,
r.BunchOfOtherCols,
tKV.TargetId
FROM OriginalRows r
CROSS JOIN TargetKeyValue tKV
LEFT JOIN RowKeyValue rKV
ON rKV.KeyPart = tKV.TargetKey AND rKV.ValuePart = tKV.TargetValue
AND rKV.RowId = r.Id
GROUP BY
r.Id,
r.Cost,
r.BunchOfOtherCols,
tKV.TargetId
HAVING COUNT(*) = COUNT(rKV.RowId) -- all target k/vs have match
SELECT
r.Id,
r.Cost,
r.BunchOfOtherCols,
tKV.TargetId
FROM OriginalRows r
CROSS JOIN TargetKeyValue tKV
CROSS APPLY (VALUES (CASE WHEN EXISTS (SELECT 1
FROM RowKeyValue rKV
WHERE rKV.KeyPart = tKV.TargetKey AND rKV.ValuePart = tKV.TargetValue
AND rKV.RowId = r.Id
) THEN 1 END
) ) rKV(IsMatch)
GROUP BY
r.Id,
r.Cost,
r.BunchOfOtherCols,
tKV.TargetId
HAVING COUNT(*) = COUNT(rKV.IsMatch) -- all target k/vs have match
Instead of the HAVING COUNT(*) = COUNT(rKV.RowId)
you could also replace that with
HAVING COUNT(CASE WHEN rKV.RowId IS NULL THEN 1 END) = 0 -- all target k/vs have match
If you want a function for a single OriginalRows
, that is somewhat simpler:
CREATE FUNCTION dbo.MatchTargetAgainstKeysFromRow
(
@rowid INT
)
RETURNS TABLE
AS RETURN (
SELECT
tKV.TargetId
FROM TargetKeyValue tKV
LEFT JOIN RowKeyValue rKV
ON rKV.KeyPart = tKV.TargetKey AND rKV.ValuePart = tKV.TargetValue
AND rKV.RowId = @rowid
GROUP BY
tKV.TargetId
HAVING COUNT(*) = COUNT(rKV.RowId) -- all target k/vs have match
);
GO