google-cloud-platformgoogle-bigquerystandards-compliancegoogle-cloud-dlppii

How to run Cloud DLP (Data Loss Prevention) in all Big Query tables in my project?


As per the DLP docs, when you create an Inspect Job, you need to specify the table reference:

{
  "inspectJob":{
    "storageConfig":{
      "bigQueryOptions":{
        "tableReference":{
          "projectId":"bigquery-public-data",
          "datasetId":"usa_names",
          "tableId":"usa_1910_current"
        },
        "rowsLimit":"1000",
        "sampleMethod":"RANDOM_START",
        "identifyingFields":[
          {
            "name":"name"
          }
        ]
      }
    },
    "inspectConfig":{
      "infoTypes":[
        {
          "name":"FIRST_NAME"
        }
      ],
      "includeQuote":true
    },
    "actions":[
      {
        "saveFindings":{
          "outputConfig":{
            "table":{
              "projectId":"[PROJECT-ID]",
              "datasetId":"testingdlp",
              "tableId":"bqsample3"
            },
            "outputSchema":"BASIC_COLUMNS"
          }
        }
      }
    ]
  }
}

This means I'd need to create one Inspect Job for each table, I want to find sensitive data in all my Big Query resources, how to do it?


Solution

  • To run DLP in all your Big Query resources, you have two options.

    For a full tutorial on the second option, there's a blog post talking about it.

    Beware: "it is possible for costs to become very high, depending on the quantity of information that you instruct the Cloud DLP to scan. To learn several methods that you can use to keep costs down while also ensuring that you're using the Cloud DLP to scan the exact data that you intend to, see Keeping Cloud DLP costs under control."

    The billing information was up to date at the time of this writing, for the most updated info, check the DLP billing docs page.