google-cloud-platformgoogle-bigquerypiigoogle-cloud-dlp

How to use Google DLP API to delete sensitive content from data stored in Google Big Query?


I have a certain table in Google Big Query which has some sensitive fields. I read and understood about inspection of data but cannot find a way to redact the data using DLP API directly in BigQuery database.

Two questions:

  1. Is it possible to do it just using DLP API?
  2. If not, what is the best way to fix data in a table which runs into Terabytes?

Solution

  • The API does not yet support de-identifying bigquery directly.

    You can however write a dataflow pipeline that leverages content.deidentify. If you batch your rows utilizing Table objects (https://cloud.google.com/dlp/docs/reference/rest/v2/ContentItem#Table) this can work pretty efficiently.