sqlsql-serverstringdata-manipulationdata-quality

Split Full Name with Format: {Last, First Middle} Comprehensive Cases


My client sent me name data as a Name string which includes the last, first, and middle names in a single entry. I need them split into LastName, FirstName, and MiddleName. I have found some scripts online, but they don't serve my purposes because they either (1) use a different format, or (2) don't handle edge cases very well. See the examples below:

  1. Nightingale, Florence -> Florence Nightingale
  2. Bond, James Bond -> James Bond Bond
  3. Abbott, Edwin A. -> Edwin A. Abbott

Can someone help me write a SQL Server script that splits a string into the various pieces I'm looking for?


Solution

  • Please note the following:

    1. Always request normalized data to ensure the highest data quality. I tried to enumerate all possible cases for last, first, and middle name combinations but I'm sure I did not get all of them.
    2. My script requires the format: LastName@DELIMITER1@DELIMITER2FirstName@DELIMITER2MiddleName, but can be easily altered for other formats.
    3. This script does not separate tiles like Dr., or handle suffixes.
    4. Credit to MemKills for the idea of the test data set, which I expanded.

    >

    DECLARE @DELIMITER1 varchar(1), @DELIMITER2 varchar(1), @MAX_LENGTH int
    SET @DELIMITER1 = ','
    SET @DELIMITER2 = ' '
    SET @MAX_LENGTH = 50
    
    SELECT  [Name],
        SUBSTRING(Name,1,CHARINDEX(@DELIMITER1,Name) -1) AS LastName,                   -- Less one char for @DELIMITER1
        SUBSTRING(Name,CHARINDEX(@DELIMITER1,Name)+ 2,@MAX_LENGTH) AS FirstAndMiddle,   -- Plus two for @DELIMITER1 and @DELIMITER2
        CASE 
            -- Middle name follows two-name first names like Mary Ann 
            WHEN LEN(SUBSTRING(NAME, CHARINDEX(@DELIMITER1,Name)+ 2,@MAX_LENGTH)) - LEN(REPLACE(SUBSTRING(NAME, CHARINDEX(@DELIMITER1,Name)+ 2,@MAX_LENGTH), @DELIMITER2, '')) > 0
                THEN SUBSTRING(Name, LEN(Name) - CHARINDEX(@DELIMITER2, REVERSE(Name))+2, @MAX_LENGTH)
            ELSE NULL
        END AS MiddleName,
    
        CASE 
            -- Count the number of @DELIMITER2. Choose the string between the @DELIMITER1 and the final @DELIMITER2. 
            WHEN LEN(SUBSTRING(NAME, CHARINDEX(@DELIMITER1,Name)+ 2,@MAX_LENGTH)) - LEN(REPLACE(SUBSTRING(NAME, CHARINDEX(@DELIMITER1,Name)+ 2,@MAX_LENGTH), @DELIMITER2, '')) > 0
                Then SUBSTRING(Name, CHARINDEX(@DELIMITER1,Name)+ 2, 
                     (LEN(SUBSTRING(NAME, CHARINDEX(@DELIMITER1,Name)+ 2,@MAX_LENGTH))
                     - LEN(SUBSTRING(Name, LEN(Name) - CHARINDEX(@DELIMITER2, REVERSE(Name))+2, @MAX_LENGTH))))
            ELSE SUBSTRING(Name,CHARINDEX(@DELIMITER1,Name)+ 2,@MAX_LENGTH)
        END AS FirstName
    FROM 
    (
        SELECT  [Name] = 'Zzz, A' UNION ALL
        SELECT  'de Zzz, Aaa' UNION ALL
        SELECT  'Zzz, Aaaa' UNION ALL
        SELECT  'Zzz, A B' UNION ALL
        SELECT  'Zzz, Aaaa Bbbb' UNION ALL
        SELECT  'de Zzz, Aaaa' UNION ALL
        SELECT  'de Zzz, Aaaa B' UNION ALL
        SELECT  'van Zzz, Aaaa B' UNION ALL
        SELECT  'Yyy-Zzz, Aaaa B' UNION ALL
        SELECT  'd''Zzz, Aaaa B' UNION ALL
        SELECT  'Zzz, Aaaa Bbbb C' UNION ALL
        SELECT  'Zzz, Aaaa Bbbb Cccc'
    ) AS X