tensorflowkerasgenerative-adversarial-networkcleverhansadversarial-machines

Question on ElasticNet algorithm implemented in Cleverhans


I'm trying to use the Elastic-Net algorithm implemented in Cleverhans to generate adversarial samples in a classification task. The main problem is that i'm trying to use it in a way to obtain an higher confidence at classification time on a target class (different from the original one) but i'm not able to reach good results. The system that i'm trying to fool is a DNN with a softmax output on 10 classes.

For instance:

  1. Given a sample of class 3 i want to generate an adversarial sample of class 0.
  2. Using the default hyperparameters implemented in the ElasticNetMethod of cleverhans i'm able to obtain a succesful attack, so the class assigned to the adversarial sample became the class 0, but the confidence is quite low(about 30%). This also happens trying different values for the hyperparameters.
  3. My purpose is to obtain a quite higher confidence (at least 90%).
  4. For other algorithm like "FGSM" or "MadryEtAl" i'm able to reach this purpose creating a loop in which the algorithm is applied until the sample is classified as the target class with a confidence greater than 90%, but i can't to apply this iteration on the EAD algorithm because at each step of the iteration it yields the adversarial sample generated at the first step, and in the following iterations it remains unchanged. (I know that this may happens because the algorithm is different from the other two metioned, but i'm trying to find a solution to reach my purpose).

This is the code that i'm actually using to generate adversarial samples.

ead_params  = { 'binary_search_steps':9, 'max_iterations':100 , 'learning_rate':0.001, 'clip_min':0,'clip_max':1,'y_target':target}
adv_x = image
founded_adv = False
threshold = 0.9
wrap = KerasModelWrapper(model)
ead = ElasticNetMethod(wrap, sess=sess)

while (not founded_adv):

    adv_x = ead.generate_np(adv_x, **ead_params)
    prediction = model.predict(adv_x).tolist()
    pred_class = np.argmax(prediction[0])
    confidence = prediction[0][pred_class]    

    if (pred_class == 0 and confidence >= threshold):
        founded_adv = True
        

The while loop may generate a sample until the target class is reached with a confidence greater than 90%. This code actually works with FGSM and Madry, but runs infinitely using EAD.

Library version:

Tensorflow: 2.2.0 Keras: 2.4.3 Cleverhans: 2.0.0-451ccecad450067f99c333fc53592201

Anyone can help me ?

Thanks a lot.


Solution

  • For anyone intrested in this problem the previous code can be modified in this way to works properly:

    FIRST SOLUTION:

    prediction = model.predict(image)
    initial_predicted_class = np.argmax(prediction[0])
    ead_params  = { 'binary_search_steps':9, 'max_iterations':100 , 'learning_rate':0.001,'confidence':1, 'clip_min':0,'clip_max':1,'y_target':target}
    adv_x = image
    founded_adv = False
    threshold = 0.9
    wrap = KerasModelWrapper(model)
    ead = ElasticNetMethod(wrap, sess=sess)
    
    while (not founded_adv):
    
        adv_x = ead.generate_np(adv_x, **ead_params)
        prediction = model.predict(adv_x).tolist()
        pred_class = np.argmax(prediction[0])
        confidence = prediction[0][pred_class]    
    
        if (pred_class == initial_pred_class and confidence >= threshold):
            founded_adv = True
        else: 
            ead_params['confidence'] += 1 
    

    Using the confidence parameter implemented in the library. Actually we increase by 1 the confidence parameter if the probability of the target class does not increase.

    SECOND SOLUTION :

    prediction = model.predict(image)
    initial_predicted_class = np.argmax(prediction[0]) 
    
    ead_params   = {'beta':5e-3 , 'binary_search_steps':6, 'max_iterations':10 , 'learning_rate':3e-2, 'clip_min':0,'clip_max':1}
    threshold = 0.96
    adv_x = image
    founded_adv = False
    wrap = KerasModelWrapper(model)
    ead = ElasticNetMethod(wrap, sess=sess)
    
    while (not founded_adv):
    
        eps_hyp = 0.5
        new_adv_x = ead.generate_np(adv_x, **ead_params)
        pert = new_adv_x-adv_x
        new_adv_x = adv_x - eps_hyp*pert
        new_adv_x = (new_adv_x - np.min(new_adv_x)) / (np.max(new_adv_x) - np.min(new_adv_x))
        adv_x = new_adv_x
        prediction = model.predict(new_adv_x).tolist()
        pred_class = np.argmax(prediction[0])
        confidence = prediction[0][pred_class]
        print(pred_class)
        print(confidence)
    
    
        if (pred_class == initial_predicted_class and confidence >= threshold): 
            founded_adv = True
    

    In the second solution there are the following modification to the original code:

    -Initial_predicted_class is the class predicted by the model on the benign sample ( "0" for our example ).

    -In the parameters of the algorithm (ead_params) we don't insert the target class.

    -Then we can obtain the perturbation given by the algorithm calculating pert = new_adv_x - adv_x where "adv_x" is the original image (in the first step of the for loop), and new_adv_x is the perturbed sample generated by the algorithm.

    -The previous operation is useful because the EAD original alghoritm calculate the perturbation to maximize the loss w.r.t the class "0", but in our case we want to minimize it.

    -So, we can calculate the new perturbed image as new_adv_x = adv_x - eps_hyp*pert (where the eps_hyp is an epsilon hyperparameter that i've introduced to reduce the perturbation), and than we normalize the new perturbed image.

    -I've tested the code for a large number of images, and the the confidence always increase, so i think that can be a good solution for this purpose.

    I think that the second solution allow to obtain finer perturbation.