sqlsql-serverjoinsetcross-apply

Proper Set operation to find a matching set in a set of sets, or full join?


TLDR

How to match a set of sets against a single set and bind that to the corresponding rows?

Given a row that has a linked summary table with key/value pairs describing properties of that row, and a bunch of search-descriptions (target) that describes how to sum up stuff from that row, how can I find which search-descriptions match a given row, based on matching the properties-table against the key/value pairs in the search-description?

Simplified example:

CREATE TABLE TargetKeyValue(TargetId INT, TargetKey NVARCHAR(50), TargetValue NVARCHAR(50))
CREATE TABLE OriginalRows(Id INT, Cost DECIMAL, BunchOfOtherCols NVARCHAR(500),
    CONSTRAINT [PK_Id] PRIMARY KEY CLUSTERED ([Id] ASC))
CREATE TABLE RowKeyValue(RowId INT, KeyPart NVARCHAR(50), ValuePart NVARCHAR(50),
     CONSTRAINT [FK_RowId_Id] FOREIGN KEY (RowId) REFERENCES OriginalRows(Id))

INSERT INTO OriginalRows VALUES
    (1, 55.5, 'Some cool red coat'),
    (2, 80.0, 'Some cool green coat XL'),
    (3, 250.00, 'Some cool green coat L'),
    (4, 100.0, 'Some whiskey'),
    (5, 42.0, 'This is not a match')

INSERT INTO RowKeyValue VALUES
    (1, 'Color', 'Red'),
    (1, 'Size', 'XL'),
    (1, 'Kind', 'Coat'),
    (2, 'Color', 'Green'),
    (2, 'Size', 'XL'),
    (2, 'Kind', 'Coat'),
    (3, 'Color', 'Green'),
    (3, 'Size', 'L'),
    (3, 'Kind', 'Coat'),
    (4, 'Color', 'Green'),
    (4, 'Size', 'Medium'),
    (4, 'Kind', 'Whiskey')


INSERT INTO TargetKeyValue VALUES
    (55, 'Color', 'Red'),
    (56, 'Color', 'Green'),
    (56, 'Size', 'XL'),
    (57, 'Kind', 'Coat'),
    (58, 'Color', 'Green'),
    (58, 'Size', 'Medium'),
    (58, 'Kind', 'Whiskey')

This gives the following tables:


-- table OriginalRows
Id  Cost    BunchOfOtherCols
1   56      Some cool red coat
2   80      Some cool green coat XL
3   250     Some cool green coat L
4   100     Some whiskey
5   42      This is not a match

-- table RowKeyValue
RowId   KeyPart ValuePart
1       Color   Red
1       Size    XL
1       Kind    Coat
2       Color   Green
2       Size    XL
2       Kind    Coat
3       Color   Green
3       Size    L
3       Kind    Coat
4       Color   Green
4       Size    Medium
4       Kind    Whiskey

-- table TargetKeyValue
TargetId    TargetKey   TargetValue
55          Color       Red
56          Color       Green
56          Size        XL
57          Kind        Coat
58          Color       Green
58          Size        Medium
58          Kind        Whiskey

Expected outcome

The function below will give this correct outcome:

Id  Cost    BunchOfOtherCols            IsTargetMatch   TargetKeyId
1   56      Some cool red coat          1               55
2   80      Some cool green coat XL     1               56
3   250     Some cool green coat L      1               57
4   100     Some whiskey                1               58
5   42      This is not a match         0               NULL

In other words:

Current approach with cursors... alas

The code below uses cursors, but this proves slow (understandably so since it's basically just a non-indexed table scan over and over).

Another approach I tried was using XML PATH queries, but that turned out to be a non-starter (it was easy, but too slow as well).

I'm aware this is a non-trivial task in relation databases, but I hope there's still a rather straightforward solution. What I have below kinda works, and I might just use a batch process to store the results or something, unless there's a better way to do this using SET operations or, idunno, FULL JOIN?

Any solution that can be used in a view (i.e., not involving dynamic SQL or calling SP's) is fine. We used to have an SP-based solution but since data needs to be analyzed in PowerBI and other systems, SQL Views and determinism are the way to go.

Here's a fully working minimal example of what I'm after. The function is the part that I'm looking at to replace with a less procedural and more functional, i.e. set-based approach:

CREATE TABLE TargetKeyValue(TargetId INT, TargetKey NVARCHAR(50), TargetValue NVARCHAR(50))
CREATE TABLE OriginalRows(Id INT, Cost DECIMAL, BunchOfOtherCols NVARCHAR(500),
    CONSTRAINT [PK_Id] PRIMARY KEY CLUSTERED ([Id] ASC))
CREATE TABLE RowKeyValue(RowId INT, KeyPart NVARCHAR(50), ValuePart NVARCHAR(50),
     CONSTRAINT [FK_RowId_Id] FOREIGN KEY (RowId) REFERENCES OriginalRows(Id))

INSERT INTO OriginalRows VALUES
    (1, 55.5, 'Some cool red coat'),
    (2, 80.0, 'Some cool green coat XL'),
    (3, 250.00, 'Some cool green coat L'),
    (4, 100.0, 'Some whiskey'),
    (5, 42.0, 'This is not a match')

INSERT INTO RowKeyValue VALUES
    (1, 'Color', 'Red'),
    (1, 'Size', 'XL'),
    (1, 'Kind', 'Coat'),
    (2, 'Color', 'Green'),
    (2, 'Size', 'XL'),
    (2, 'Kind', 'Coat'),
    (3, 'Color', 'Green'),
    (3, 'Size', 'L'),
    (3, 'Kind', 'Coat'),
    (4, 'Color', 'Green'),
    (4, 'Size', 'Medium'),
    (4, 'Kind', 'Whiskey')


INSERT INTO TargetKeyValue VALUES
    (55, 'Color', 'Red'),
    (56, 'Color', 'Green'),
    (56, 'Size', 'XL'),
    (57, 'Kind', 'Coat'),
    (58, 'Color', 'Green'),
    (58, 'Size', 'Medium'),
    (58, 'Kind', 'Whiskey')

GO


CREATE FUNCTION dbo.MatchTargetAgainstKeysFromRow
(
    @rowid INT
)
RETURNS @MatchResults TABLE(
    IsTargetMatch BIT,
    TargetKeyId INT)

AS
BEGIN

    --
    -- METHOD (1) (faster, by materializing the xml field into a cross-over lookup table)
    --

    -- single row from activities as key/value pairs multi-row
    DECLARE @rowAsKeyValue AS TABLE(KeyPart NVARCHAR(1000), ValuePart NVARCHAR(MAX))
    INSERT INTO @rowAsKeyValue (KeyPart, ValuePart)
        SELECT KeyPart, ValuePart FROM RowKeyValue WHERE RowId = @rowid


    DECLARE @LookupColumn NVARCHAR(100)
    DECLARE @LookupValue NVARCHAR(max)
    DECLARE @TargetId INT
    DECLARE @CurrentTargetId INT
    DECLARE @IsMatch INT
    DECLARE key_Cursor CURSOR
        LOCAL STATIC FORWARD_ONLY READ_ONLY
        FOR SELECT TargetKey, TargetValue, TargetId FROM TargetKeyValue  ORDER BY TargetId

    OPEN key_Cursor
    FETCH NEXT FROM key_Cursor INTO @LookupColumn, @LookupValue, @TargetId

    WHILE @@FETCH_STATUS = 0
    BEGIN
        SET @IsMatch = (SELECT COUNT(*) FROM @rowAsKeyValue WHERE KeyPart = @LookupColumn AND ValuePart = @LookupValue)
        IF(@IsMatch = 0)
        BEGIN
            -- move to next key that isn't the current key
            SET @CurrentTargetId = @TargetId
            WHILE @@FETCH_STATUS = 0 AND @CurrentTargetId = @TargetId
            BEGIN
                FETCH NEXT FROM key_Cursor INTO @LookupColumn, @LookupValue, @TargetId
            END
        END
        ELSE
            BEGIN
                SET @CurrentTargetId = @TargetId
                WHILE @@FETCH_STATUS = 0 AND @IsMatch > 0 AND @CurrentTargetId = @TargetId
                BEGIN
                    FETCH NEXT FROM key_Cursor INTO @LookupColumn, @LookupValue, @TargetId
                    IF @CurrentTargetId = @TargetId
                        SET @IsMatch = (SELECT COUNT(*) FROM @rowAsKeyValue WHERE KeyPart = @LookupColumn AND ValuePart = @LookupValue)
                END
                IF @IsMatch > 0
                BEGIN
                    -- we found a positive matching key, nothing more to do
                    BREAK
                END
            END
    END

    DEALLOCATE key_Cursor       -- deallocating a cursor also closes it

    INSERT @MatchResults
    SELECT
        (CASE WHEN (SELECT COUNT(*) FROM @rowAsKeyValue) > 0 THEN 1 ELSE 0 END),
        (CASE WHEN @IsMatch > 0 THEN @CurrentTargetId ELSE NULL END)

    RETURN
END

GO

-- function in action
select * from OriginalRows
cross apply dbo.MatchTargetAgainstKeysFromRow(Id) fn

-- cleanup
drop function dbo.MatchTargetAgainstKeysFromRow
drop table TargetKeyValue
drop table RowKeyValue
drop table OriginalRows

Solution

  • This question is a case of Relational Division With Remainder, with multiple dividends and divisors.

    Relational division is basically the opposite of a join: in this case, we want to know which OriginalRows match which TargetIds, based on every key/value pair for the TargetId matching the key/value pairs of the OriginalRows.

    There are many ways to do this, here are a few:

    SELECT
        r.Id,
        r.Cost,
        r.BunchOfOtherCols,
        t.TargetId
    FROM OriginalRows r
    OUTER APPLY (
        SELECT ttKV.TargetId
        FROM TargetKeyValue tKV
        LEFT JOIN RowKeyValue rKV
            ON rKV.KeyPart = tKV.TargetKey AND rKV.ValuePart = tKV.TargetValue
            AND rKV.RowId = r.Id
        GROUP BY tKV.TargetId
        HAVING COUNT(*) = COUNT(rKV.RowId)  -- all target k/vs have match
    ) t;
    
    SELECT
        r.Id,
        r.Cost,
        r.BunchOfOtherCols,
        tKV.TargetId
    FROM OriginalRows r
    CROSS JOIN TargetKeyValue tKV
    LEFT JOIN RowKeyValue rKV
        ON rKV.KeyPart = tKV.TargetKey AND rKV.ValuePart = tKV.TargetValue
        AND rKV.RowId = r.Id
    GROUP BY
        r.Id,
        r.Cost,
        r.BunchOfOtherCols,
        tKV.TargetId
    HAVING COUNT(*) = COUNT(rKV.RowId)  -- all target k/vs have match
    
    SELECT
        r.Id,
        r.Cost,
        r.BunchOfOtherCols,
        tKV.TargetId
    FROM OriginalRows r
    CROSS JOIN TargetKeyValue tKV
    CROSS APPLY (VALUES (CASE WHEN EXISTS (SELECT 1
            FROM RowKeyValue rKV
                WHERE rKV.KeyPart = tKV.TargetKey AND rKV.ValuePart = tKV.TargetValue
                AND rKV.RowId = r.Id
            ) THEN 1 END
    ) ) rKV(IsMatch)
    GROUP BY
        r.Id,
        r.Cost,
        r.BunchOfOtherCols,
        tKV.TargetId
    HAVING COUNT(*) = COUNT(rKV.IsMatch)  -- all target k/vs have match
    

    Instead of the HAVING COUNT(*) = COUNT(rKV.RowId) you could also replace that with

    HAVING COUNT(CASE WHEN rKV.RowId IS NULL THEN 1 END) = 0 -- all target k/vs have match
    

    If you want a function for a single OriginalRows, that is somewhat simpler:

    CREATE FUNCTION dbo.MatchTargetAgainstKeysFromRow
    (
        @rowid INT
    )
    RETURNS TABLE
    AS RETURN (
        SELECT
            tKV.TargetId
        FROM TargetKeyValue tKV
        LEFT JOIN RowKeyValue rKV
            ON rKV.KeyPart = tKV.TargetKey AND rKV.ValuePart = tKV.TargetValue
            AND rKV.RowId = @rowid
        GROUP BY
            tKV.TargetId
        HAVING COUNT(*) = COUNT(rKV.RowId)  -- all target k/vs have match
    );
    
    GO