sqldata-quality

SQL - find all examples of values in all colums with given characteristic


I have a dataset (8.5 mill rows), where all values in all columns must be enclosed in quotation symbols (" "). I have discovered that there is a problem - some few records holds values in some columns with the last quotation symbol missing. Now I need to try to get an overview on the issue - which columns have examples of this error (it is due to truncation upstream in the solution).

From the example dummy data inserted below:

How do I write a query, which outputs the columns "Last name" and "Age" due to the missing end quotation in row 2 and 3 in these columns? To be clear - how do I identify columns with sporadic truncated values?

Example data with missing quotation symbols

Thanks,

knn


Solution

  • You need to check the last character in the string. It is done using the substring function, passing an argument of -1 means the last character. And you check to see if it is different than the double quote symbol.

    SELECT * FROM YourTable
    WHERE 
    substr("Last Name", -1) <> '"' OR substr(Age,-1) <> '"'
    

    You can play around with it here http://sqlfiddle.com/#!4/10a77e/1