csvgoogle-cloud-platformencodinggoogle-cloud-dlp

make FPE algorithm of DLP in GCP refrain from using a specific set of characters when encoding


I was trying to de-identify 1 column of CSV file using the FPE algorithm of DLP service in GCP. My CSV file has a comma(,) as the delimiter. But in the encoded data for some rows comma(,) is included in the data which is causing trouble when re-identifying the data. Is there any way to specify to the FPE algorithm to refrain from using a specific set of characters when encoding?


Solution

  • https://github.com/googleapis/googleapis/blob/01d4201e2620da2084d2151522c25cf49dda9da3/google/privacy/dlp/v2/dlp.proto#L852

    If you inspect it as a structured file using ByteContentItem.CSV then the transformations will apply over the cell values ... this will avoid having the commas get caught up in findings.