sql-serverset-based

SQL Server: Most efficient way to look for set based on 2 columns


Edit: I realized I asked the question wrong. What I really meant to ask was, "given a set of values for [Column B], check if there is a value in [Column A] that matches all of the set and no others."


I'm not sure if what I want to do has an official name, so searching for my question has been difficult.

Given a table with 2 columns and a list of values, I want to see if this combination (and only this combination) exists in the table.

For instance, given this table (assume it's the whole table):

|----------|----------|
| Column A | Column B |
|----------|----------|
| 12345    | abcde    |
|----------|----------|
| 12345    | xyz      |
|----------|----------|
| 12345    | abcd     |
|----------|----------|
| 12345    | abbba    |
|----------|----------|

And given this input parameter:

Declare @columnBs Varchar(Max) = '["abcde","xyz","abcd","abbba"]';

For that set, I want to return 12345. So, basically, I want to run a check and see if any value in [Column A] matches all of the values in [Column B] to all of the values in @columnBs AND NO OTHER VALUES.

Without a value for [Column A] as a starting point, I'm having trouble even conceiving of a long-form solution.


If it helps to conceptualize this better, this is a solution for messaging where:

So, if a new message comes in for a set of users, I want to see whether there is an existing thread for all of the users supplied by @columnBs and no other users.


Solution

  • I read this that you need to find a value in ColumnA that corresponds to exactly, no more and no less, the same values in your query. So, you need to join the search values, ensure all of them exist for a single ColumnA, and then ensure that no more exist. You could do it by cross joining them, but for larger sets of data, that would have awful performance. This may be a little better:

    -- Set up the source data.
    create table MyTable (
            ColumnA int,
            ColumnB nvarchar(max)
        )
    insert MyTable
        (ColumnA, ColumnB)
        Values
        -- Value 1 contains exactly the same values as we'll be looking for
        (1, 'abcde'),
        (1, 'xyz'),
        (1, 'abcd'),
        (1, 'abbba'),
        -- Value 2 contains all of them, plus one more different value
        (2, 'abcde'),
        (2, 'xyz'),
        (2, 'abcd'),
        (2, 'abbba'),
        (2, 'xxxxx'),
        -- Value 3 contains one less value
        (3, 'abcde'),
        (3, 'xyz'),
        (3, 'abcd'),
        -- Value 4 contains one different value
        (4, 'abcde'),
        (4, 'xyz'),
        (4, 'abcd'),
        (4, 'xxxxxxxxx')
    
    
    -- These are the values we are looking for:
    create table #searchValues (
                value nvarchar(max)
            )
    insert #searchValues
        (value) values
        ('abcde'),
        ('xyz'), 
        ('abcd'), 
        ('abbba')
    
    declare @valueCount int = (select COUNT(*) from #searchValues)
    
    
    select  t.ColumnA
        from (
            -- This inner query finds all ColumnA values
            -- that link to all of the specified ColumnB values.
            select  tab.ColumnA
                from #searchValues t
                join MyTable tab on
                    t.value = tab.ColumnB
                group by tab.ColumnA
                having COUNT(*) = @valueCount
            ) x
        -- And this outer join and group by will filter out those that 
        -- have all the values, plus some more.
        join MyTable t on
            t.ColumnA = x.ColumnA
        group by t.ColumnA
        having COUNT(*) = @valueCount
    
    
    
    drop table #searchValues
    drop table MyTable
    

    This will produce just the value 1 as a result, because it matches exactly.