sqlsql-server

How to obtain only rows with a nvarchar column containing at least one of a list of string?


I've got a list of terms in column Terms of table TermsTable. I want to obtain all of the rows of TableA, where column Description contains at least one of the terms in TermsTable. How can I do that?

I tried, inspired by Combining "LIKE" and "IN" for SQL Server:

SELECT DISTINCT a.*
FROM TableA a
INNER JOIN TermsTable t ON a.Description LIKE '%'+t.Terms+'%'

but it filters different rows with the same values (consider that maybe I want only some of the columns in TableA that may be equal but referring to different things). If I remove the DISTINCT, I got duplicated rows if more than one term is found.

For example, taking into account the following TableA:

ID Description Value
1 This is an example 11
2 This is another example 3
3 However this is left out 5
4 This is another example 3

And this is my TermsTable:

Terms
example
foo
another

I want as the output (I simplified before putting a.* but I just really want a few columns)

Description Value
This is an example 11
This is another example 3
This is another example 3

With the following query I got what I want, but I think it's clumsy and I'd like to know if there isn't a better way to solve this:

SELECT a.*
FROM TableA a
WHERE a.ID IN (SELECT DISTINCT a.ID -- Or whatever the PK is
               FROM TableA a
               INNER JOIN TermsTable t ON a.Description LIKE '%' + t.Terms + '%')

EDIT 2025-03-13: I was looking for whole words match.


Solution

  • As already mentioned, the idea is to use EXISTS and to apply the word match in the correlated subquery.

    A "standard" LIKE might not satisfy your requirements. See following sample data:

    TableA:

    ID Description Value
    1 This is an example 11
    2 This is another example 3
    3 However this will be left out 5
    4 This isx nonfooxyz 100
    5 Fooxy is valid 10
    6 Foo isalso ok 20

    TermsTable:

    Description
    example
    foo
    is
    another

    You likely expect following result:

    Description Value
    This is an example 11
    This is another example 3
    Fooxy is valid 10
    Foo isalso ok 20

    The first two rows contain the word "example", the third row contains the word "is" and the last row contains the word "foo". The string "However this will be left out" does not contain any of the words in your TermsTable as a whole word, but it generally does contain the string "is". The string "This isx nonfooxyz" also does not contain any matching words as a whole word, but it generally contains both the string "is" and also the string "foo".

    So how to get the expected result?

    If you try a "standard" LIKE or CHARINDEX check, then all rows will be returned because the question if the string is a whole word or appears only as a part of a word does not matter.

    So those options will fetch all rows:

    ...WHERE Description LIKE '%' + Terms +'%'
    ...WHERE CHARINDEX(Terms, Description) > 0)

    You might argue you can add spaces to the LIKE condition, but it will still be faulty because the spaces could ocuur everywhere, you don't know where. And the condition would maybe be case-sensitive and you don't want that.

    Long story short: A simple trick is a regex check to check whole words and to add some separator (let's say a dot) to also cover the first and the last word of each text.

    Something like this:

    ...WHERE '.' + description + '.' LIKE '%[^a-z]' + Terms + '[^a-z]%')

    So the entire query will be this one:

    SELECT a.Description, a.Value
    FROM TableA a
    WHERE EXISTS (
      SELECT 1 
      FROM TermsTable
      WHERE '.' + a.description + '.' LIKE '%[^a-z]' + Terms + '[^a-z]%');
    

    See this db<>fiddle which shows all the things I explained here with the sample data I showed.