I have a numeric client id to find. I created a custom info types :
custom_info_types = [
{
"info_type": {"name": "CLIENTID"},
"regex": {"pattern": r'\d{7,8}'},
}
]
As expected, a lot of findings came out from the job and all with a very_likely likelyhood.
To reduce the findings, I'd like to use hotwords in "reverse" mode : if there's not the string "cli" in the column name, then reduce likelyhood.
In the documentation there are examples on how to do the opposite, but as every findings has a "VERY_LIKELY" likelyhood, it does not help.
hotword_rule = {
"hotword_regex": {"pattern": "(?i)(.*cli.*)(?-i)"},
"likelihood_adjustment": {
"fixed_likelihood": dlp_v2.Likelihood.VERY_LIKELY
},
"proximity": {"window_before": 1},
}
Is there any solution to do what I want ?
Thanks for your help !
In order to accomplish this you want to set the default likelihood for your custom_info_type to be VERY_UNLIKELY
and then keep your hotword rule as-is. This way if something matches it will flag as VERY_UNLIKELY
unless the header/context contains your match for "cli" in which case it will boost to VERY_LIKELY
.
Something like:
custom_info_types = [
{
"info_type": {"name": "CLIENTID"},
"regex": {"pattern": r'\d{7,8}'},
"likelihood": "VERY_UNLIKELY"
}
]
When you leave the likelihood blank in the custom_info_type definition, then it defaults to VERY_LIKELY
.
Let me know if this works.