pythonpytorchpysyft

What does this warning mean in PATE analysis?


Got warning while doing PATE analysis:

Warning: May not have used enough values of l. Increase 'moments' variable and run again.

from syft.frameworks.torch.differential_privacy import pate
data_dep_eps, data_ind_eps = pate.perform_analysis(teacher_preds=preds, indices=indices, noise_eps=0.1)
print("Data Independent Epsilon:", data_ind_eps)
print("Data Dependent Epsilon:", data_dep_eps)

It has gone after increasing the value of the "moment" parameter in the "pate.perform_analysis" analysis function. But I want to know why this was so.

data_dep_eps, data_ind_eps = pate.perform_analysis(teacher_preds=preds, indices=indices, noise_eps=0.1,moments=20)
print("Data Independent Epsilon:", data_ind_eps)
print("Data Dependent Epsilon:", data_dep_eps)

Solution

  • TL;DR: perform_analysis wants to double-check unusually small epsilon results by using more granular computation.

    The pate.perform_analysis function iterates through the data (technically the privacy loss random variable) and computes various epsilons. It uses the moments parameter to know how granular this iteration should be. When using the default 8 moments, it will compute 8 epsilons. Then it returns the minimum of the computed epsilons, as you can see in the source code.

    When this function returns a very small data-dependent epsilon, it could be because A) the data has a high amount of agreement, or B) the computation wasn't granular enough, and the true epsilon is higher. When only 8 epsilons are computed, it's possible that they happened to be anomalies in the data that paint an overly-optimistic picture of the overall epsilon! So the function sees a surprisingly small epsilon and warns you - may want to increase the moments variable to compute more epsilons and make sure you've found the real minimum. If you still get the same result when you increase your moments parameter, your data probably has a high amount of agreement, so it truly has a small data-dependent epsilon compared to its data-independent epsilon.

    Hopefully that makes sense to you at a high level. If you want more details on the math behind this, you can check out the research paper that inspired the source code.