Hello everyone and welcome to my nightmare. I am preparing a database in order to run a number of SSIS packages to migrate the required data into a data warehouse.
I am in the process of updating customer names with accented characters to non-accented characters prior to running the SSIS package, as this causes my DimCustomer Population
package to fail.
The query I used ran and ran quickly and efficiently (I was living the dream) however, I am now faced with three records which cause my UPDATE
statement to run and run to the point I have to cancel the query after about an hour as this is nowhere near acceptable.
The characters contained in these names are: í
á
and ẽ
. My suspicion is that these are only recognised in a particular format (which I cannot seem to find).
The queries and update I run are below for your perusal:
SELECT lastname COLLATE latin1_general_ci_ai AS 'LastName',
MIN(lastname COLLATE latin1_general_ci_as) AS 'CorrectLastName',
MAX(lastname COLLATE latin1_general_ci_as) AS 'IncorrectLastName'
INTO #lastname
FROM learner WITH (nolock)
GROUP BY lastname COLLATE latin1_general_ci_ai
HAVING MIN(lastname COLLATE latin1_general_ci_as) <> MAX(lastname COLLATE latin1_general_ci_as);
IF Object_id ('tempdb..#LastNameUpdate') IS NOT NULL
DROP TABLE #lastnameupdate
SELECT L.learnerid,
L.lastname,
LN.correctlastname,
LN.incorrectlastname
INTO #lastnameupdate
FROM learner L WITH (nolock)
INNER JOIN #lastname AS LN
ON L.lastname = LN.incorrectlastname
UPDATE L
SET L.lastname = LNU.correctlastname
FROM learner L
INNER JOIN #lastnameupdate AS LNU
ON L.learnerid = LNU.learnerid
Please ask any questions if you need anymore info, I am a regular stack checker for pre-posted solutions and (even if a little sad) general reading. Any information, guidance or advise would be greatly appreciated. Thanks!
Just for speeding things up: you can add the following after creating the #lastname table.
Delete from #lastname where CorrectLastName = incorrectLastName
I think if you have only a few incorrect last names this will speed a lot up.
You can also try to avoid the creation of the table #lastnameupdate.
Because with a the use of a sub query instead you could do the same.
This should also speed the whole up a little bit.