google-cloud-platformgoogle-bigquerygoogle-cloud-storagegoogle-cloud-dataflowgoogle-cloud-dlp

It is possible to run a dataflow DLP de-idenification job on a group of files in GCS?


I have a large amount of csv files in a folder that I need to run a de-identification job on and was wondering if anyone knew of any way that I could run that job on the folder/multiple files? At the moment I'm creating dataflow jobs with DLP templates and that's worked fine for single datasets. I know in GCS you can run DLPs on folders with multiple files in it but there you're only allowed to use inspection templates and not de-identification templates.

Putting them into a bucket is also not a option as the parent folder is already a bucket and buckets can't be nested.

Any help would be much appreciated thanks


Solution

  • Correct, this feature is not yet supported. The recommended solution is to use dataflow.