sqldirty-data

SQL query for entity names with small differences


I have to query a very large .csv dataset from a large city that contains thousands of dirty data entries like the following example.

How would I make an SQL query that would catch all instances of records matching the following permutations of a company's name? Do I have to clean the .csv data somehow before exporting to SQL? Thanks in advance.

'UFS INDUSTRIES INC. DBA SALLY SHERMAN FOODS',
'UFS INDUSTRIES INC. DBA SALLY SHERMAN FOODS.',
'UFS INDUSTRIES INC., DBA SALLY SHERMAN FOODS',
'UFS INDUSTRIES INC., DBA, SALLY SHERMAN FOODS',
'UFS INDUSTRIES INCORPORATED',
'UFS INDUSTRIES INCORPORATED DBA SALLY SHERMAN FOOD',
'UFS INDUSTRIES INCORPORATED DBA SALLY SHERMAN FOODS', 

Solution

  • I am adding this here as I can't put a comment because of reputation and I think you will find it important.

    Cleaning Messy data in SQL