sql-serversql-server-2008

Is full text search suitable for searching people's names?


I have a user table with a single column for a person's name:

CREATE TABLE [dbo].[Users]
(
    Id bigint NOT NULL,
    Name nvarchar(80) NOT NULL,
    PRIMARY KEY CLUSTERED (Id ASC)
)

The Name column can contain either a full name or just the first name or anything really (separated by spaces). In order to implement a search on Name, I would like to utilize SQL's full-text search, but not sure if it's suitable for searching names/nicknames and not actual words. Also the question is - which language do I choose when creating the FT index on Name?

Any other considerations?

Thank you.


Solution

  • It seems that if you want to search multi-part names, full-text search is the easiest and most appropriate approach (please correct me if I'm wrong). The other alternative being LIKE '%query%', however it has too many disadvantages:

    So I went ahead and implemented a full-text search. My queries look something like this:

    SELECT * FROM Users WHERE CONTAINS(Name, '"John*"')
    

    The only slight difficulty is that I had to convert user query (John) into a CONTAINS-friendly query ("John*"). To do that, I implemented this method in my UserRepository:

    /// <summary>
    /// Converts user-entered search query into a query that can be consumed by CONTAINS keyword of SQL Server.
    /// </summary>
    /// <example>If query is "John S Ju", the result will be "\"John*\" AND \"S*\" AND \"Ju*\"".</example>
    /// <param name="query">Query entered by user.</param>
    /// <returns>String instance.</returns>
    public static string GetContainsQuery(string query)
    {
        string containsQuery = string.Empty;
    
        var terms = query.Split(new[] { ' ' }, StringSplitOptions.None);
    
        if (terms.Length > 1)
        {
            for (int i = 0; i < terms.Length; i++)
            {
                string term = terms[i].Trim();
    
                // Add wildcard term, e.g. - "term*". The reason to add wildcard is because we want
                // to allow search by partially entered name parts (partially entered first name and/or
                // partially entered last name, etc).
                containsQuery += "\"" + term + "*\"";
    
                // If it's not the last term.
                if (i < terms.Length - 1)
                {
                    // We want all terms inside user query to match.
                    containsQuery += " AND ";
                }
            }
    
            containsQuery = containsQuery.Trim();
        }
        else
        {
            containsQuery = "\"" + query + "*\"";
        }
    
        return containsQuery;
    }
    

    Hope this helps anyone stumbling into the same issue.

    PS - I wrote a blogpost documenting this.