google-cloud-platformgoogle-cloud-dlppii

Google Cloud DLP Re-identify PII data using Deterministic encryption


I was experimenting with Google provided the article to re-identify Credit Card Number using Deterministic encryption using AES-SIV

https://cloud.google.com/solutions/creating-cloud-dlp-de-identification-transformation-templates-pii-dataset#creating_a_key_encryption_key_kek

Accordingly, I have created a google DLP template to de-identify data and in the test option of the template it is working if we provide a 3 line csv with correct header names [I am using record type template]

DLP Template

DLP Template Test

As per the following link and video provided, the same template can be used to re-identify the data back to the original

"Cloud DLP can perform both de-identification and re-identification on an entire column using a RecordTransformation without a surrogate annotation."

https://cloud.google.com/dlp/docs/pseudonymization#cryptographic-hashing

But when we tried the same, it is re-encoding it again to a newly encoded value as per below.

DLP Template Re-identify Not working

Please let me know what I am doing wrong and how I can re-identify PII using Deterministic encryption using AES-SIV successfully

Note: This was the same behavior I got when I continued through the article ahead and did not work as expected in the blog to re-identify the data

https://cloud.google.com/solutions/validating-de-identified-data-bigquery-re-identifying-pii-data


Solution

  • You can't re-authenticate on the console, you need to use the API for this. And, because you don't use surrogate prefix, you have to rebuild your table in JSON (and it's boring to do... Or you can script it).

    You have the full detail of the API here


    The JSON to summit: the table (your deidenticated table and the template use)

    {
      "item": {
        "table": {
          "headers": [
            {
              "name": "id"
            },
            {
              "name": "phone"
            },
            {
              "name": "email"
            }
          ],
          "rows": [
            {
              "values": [
                {
                  "stringValue": "1"
                },
                {
                  "stringValue": "ASoxvJC6oo4fCgKm+ppgT6j2lSqdj179SbLc"
                },
                {
                  "stringValue": "ARkspehZ720J0f/r5zqlVN65PS756cxQDbwSniZ+g8iV"
                }
              ]
            },
            {
              "values": [
                {
                  "stringValue": "2"
                },
                {
                  "stringValue": "ATfmBVs25TEGYHLu+6DBBhpq6dk8LSJq+XyR"
                },
                {
                  "stringValue": "AZZhJLTmQKjlcXEROCRPu9u81G98/SBac/AlWXwtgiYe"
                }
              ]
            }
          ]
        }
      },
      "reidentifyTemplateName": "projects/<YOUR_PROJECT>/locations/global/deidentifyTemplates/test-email-DeId"
    }
    

    I saved the content in a file named: dlpdata.json

    The curl request to call the API

    curl -H "Content-type: application/json"  \
         -H "Authorization: Bearer $(gcloud auth print-access-token)" \
         -X POST -d @dlpdata.json \
         https://dlp.googleapis.com/v2/projects/<YOUR_PROJECT>/content:reidentify