sql-servert-sqlsql-server-2005non-ascii-characters

Find non-ASCII characters in varchar columns using SQL Server


How can rows with non-ASCII characters be returned using SQL Server?
If you can show how to do it for one column would be great.

I am doing something like this now, but it is not working

select *
from Staging.APARMRE1 as ar
where ar.Line like '%[^!-~ ]%'

For extra credit, if it can span all varchar columns in a table, that would be outstanding! In this solution, it would be nice to return three columns:

 Id | FieldName | InvalidText       |
----+-----------+-------------------+
 25 | LastName  | Solís             |
 56 | FirstName | François          |
100 | Address1  | 123 Ümlaut street |

Invalid characters would be any outside the range of SPACE (3210) through ~ (12710)


Solution

  • try something like this:

    DECLARE @YourTable table (PK int, col1 varchar(20), col2 varchar(20), col3 varchar(20));
    INSERT @YourTable VALUES (1, 'ok','ok','ok');
    INSERT @YourTable VALUES (2, 'BA'+char(182)+'D','ok','ok');
    INSERT @YourTable VALUES (3, 'ok',char(182)+'BAD','ok');
    INSERT @YourTable VALUES (4, 'ok','ok','B'+char(182)+'AD');
    INSERT @YourTable VALUES (5, char(182)+'BAD','ok',char(182)+'BAD');
    INSERT @YourTable VALUES (6, 'BAD'+char(182),'B'+char(182)+'AD','BAD'+char(182)+char(182)+char(182));
    
    --if you have a Numbers table use that, other wise make one using a CTE
    WITH AllNumbers AS
    (   SELECT 1 AS Number
        UNION ALL
        SELECT Number+1
            FROM AllNumbers
            WHERE Number<1000
    )
    SELECT 
        pk, 'Col1' BadValueColumn, CONVERT(varchar(20),col1) AS BadValue --make the XYZ in convert(varchar(XYZ), ...) the largest value of col1, col2, col3
        FROM @YourTable           y
            INNER JOIN AllNumbers n ON n.Number <= LEN(y.col1)
        WHERE ASCII(SUBSTRING(y.col1, n.Number, 1))<32 OR ASCII(SUBSTRING(y.col1, n.Number, 1))>127
    UNION
    SELECT 
        pk, 'Col2' BadValueColumn, CONVERT(varchar(20),col2) AS BadValue --make the XYZ in convert(varchar(XYZ), ...) the largest value of col1, col2, col3
        FROM @YourTable           y
            INNER JOIN AllNumbers n ON n.Number <= LEN(y.col2)
        WHERE ASCII(SUBSTRING(y.col2, n.Number, 1))<32 OR ASCII(SUBSTRING(y.col2, n.Number, 1))>127
    UNION
    SELECT 
        pk, 'Col3' BadValueColumn, CONVERT(varchar(20),col3) AS BadValue --make the XYZ in convert(varchar(XYZ), ...) the largest value of col1, col2, col3
        FROM @YourTable           y
            INNER JOIN AllNumbers n ON n.Number <= LEN(y.col3)
        WHERE ASCII(SUBSTRING(y.col3, n.Number, 1))<32 OR ASCII(SUBSTRING(y.col3, n.Number, 1))>127
    order by 1
    OPTION (MAXRECURSION 1000);
    

    OUTPUT:

    pk          BadValueColumn BadValue
    ----------- -------------- --------------------
    2           Col1           BA¶D
    3           Col2           ¶BAD
    4           Col3           B¶AD
    5           Col1           ¶BAD
    5           Col3           ¶BAD
    6           Col1           BAD¶
    6           Col2           B¶AD
    6           Col3           BAD¶¶¶
    
    (8 row(s) affected)