sqldata-scrubbing

Anonymizing customer data for development or testing


I need to take production data with real customer info (names, address, phone numbers, etc) and move it into a dev environment, but I'd like to remove any semblance of real customer info.

Some of the answers to this question can help me generating NEW test data, but then how do I replace those columns in my production data, but keep the other relevant columns?

Let's say I had a table with 10000 fake names. Should I do a cross-join with a SQL update? Or do something like

UPDATE table
SET lastname = (SELECT TOP 1 name FROM samplenames ORDER By NEWID())

Solution

  • Anonymizing data can be tricky and if not done correctly can lead you to trouble, like what happened to AOL when they released search data a while back. I would attempt to create test data from scratch at all costs before I tried to convert existing customer data. Things may lead you to be able to figure out who the data belonged to using things such as behavioral analysis and other data points that you might not consider sensitive. I would rather be safe than sorry.