python-3.xtensorflowneural-networkadversarial-machines

Get gradient value necessary to break an image


I've been experimenting with adversarial images and I read up on the fast gradient sign method from the following link https://arxiv.org/pdf/1412.6572.pdf...

enter image description here

The instructions explain that the necessary gradient can be calculated using backpropagation... enter image description here

I've been successful at generating adversarial images but I have failed at attempting to extract the gradient necessary to create an adversarial image. I will demonstrate what I mean.

Let us assume that I have already trained my algorithm using logistic regression. I restore the model and I extract the number I wish to change into a adversarial image. In this case it is the number 2...

# construct model
logits = tf.matmul(x, W) + b
pred = tf.nn.softmax(logits)
...
...
# assign the images of number 2 to the variable
sess.run(tf.assign(x, labels_of_2))
# setup softmax
sess.run(pred)

# placeholder for target label
fake_label = tf.placeholder(tf.int32, shape=[1])
# setup the fake loss
fake_loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits,labels=fake_label)

# minimize fake loss using gradient descent,
# calculating the derivatives of the weight of the fake image will give the direction of weights necessary to change the prediction
adversarial_step = tf.train.GradientDescentOptimizer(learning_rate=FLAGS.learning_rate).minimize(fake_loss, var_list=[x])

# continue calculating the derivative until the prediction changes for all 10 images
for i in range(FLAGS.training_epochs):
    # fake label tells the training algorithm to use the weights calculated for number 6
    sess.run(adversarial_step, feed_dict={fake_label:np.array([6])})
    sess.run(pred)

This is my approach, and it works perfectly. It takes my image of number 2 and changes it only slightly so that when I run the following...

x_in = np.expand_dims(x[0], axis=0)
classification = sess.run(tf.argmax(pred, 1))
print(classification)

it will predict the number 2 as a number 6.

The issue is, I need to extract the gradient necessary to trick the neural network into thinking number 2 is 6. I need to use this gradient to create the nematode mentioned above.

I am not sure how can I extract the gradient value. I tried looking at tf.gradients but I was unable to figure out how to produce an adversarial image using this function. I implemented the following after the fake_loss variable above...

tf.gradients(fake_loss, x)

for i in range(FLAGS.training_epochs):
    # calculate gradient with weight of number 6
    gradient_value = sess.run(gradients, feed_dict={fake_label:np.array([6])})
    # update the image of number 2
    gradient_update = x+0.007*gradient_value[0]
    sess.run(tf.assign(x, gradient_update))
    sess.run(pred)

Unfortunately the prediction did not change in the way I wanted, and moreover this logic resulted in a rather blurry image.

I would appreciate an explanation as to what I need to do in order calculate and extract the gradient that will trick the neural network, so that if I were to take this gradient and apply it to my image as a nematode, it will result in a different prediction.


Solution

  • Why not let the Tensorflow optimizer add the gradients to your image? You can still evaluate the nematode to get the resulting gradients that were added.

    enter image description here

    I created a bit of sample code to demonstrate this with a panda image. It uses the VGG16 neural network to transform your own panda image into a "goldfish" image. Every 100 iterations it saves the image as PDF so you can print it losslessly to check if your image is still a goldfish.

    import tensorflow as tf
    import numpy as np
    import matplotlib.pyplot as plt
    import IPython.display as ipyd
    from libs import vgg16 # Download here! https://github.com/pkmital/CADL/tree/master/session-4/libs
    
    pandaimage = plt.imread('panda.jpg')
    pandaimage = vgg16.preprocess(pandaimage)
    plt.imshow(pandaimage)
    
    img_4d = np.array([pandaimage])
    
    g = tf.get_default_graph()
    input_placeholder = tf.Variable(img_4d,trainable=False)
    to_add_image = tf.Variable(tf.random_normal([224,224,3], mean=0.0, stddev=0.1, dtype=tf.float32))
    combined_images_not_clamped = input_placeholder+to_add_image
    
    filledmax = tf.fill(tf.shape(combined_images_not_clamped), 1.0)
    filledmin = tf.fill(tf.shape(combined_images_not_clamped), 0.0)
    greater_than_one = tf.greater(combined_images_not_clamped, filledmax)
    
    combined_images_with_max = tf.where(greater_than_one, filledmax, combined_images_not_clamped)
    lower_than_zero =tf.less(combined_images_with_max, filledmin)
    combined_images = tf.where(lower_than_zero, filledmin, combined_images_with_max)
    
    net = vgg16.get_vgg_model()
    tf.import_graph_def(net['graph_def'], name='vgg')
    names = [op.name for op in g.get_operations()]
    
    style_layer = 'prob:0'
    the_prediction = tf.import_graph_def(
        net['graph_def'],
        name='vgg',
        input_map={'images:0': combined_images},return_elements=[style_layer])
    
    goldfish_expected_np = np.zeros(1000)
    goldfish_expected_np[1]=1.0
    goldfish_expected_tf = tf.Variable(goldfish_expected_np,dtype=tf.float32,trainable=False)
    loss = tf.reduce_sum(tf.square(the_prediction[0]-goldfish_expected_tf))
    optimizer = tf.train.AdamOptimizer().minimize(loss)
    
    
    sess = tf.InteractiveSession()
    sess.run(tf.global_variables_initializer())
    
    
    def show_many_images(*images):
        fig = plt.figure()
        for i in range(len(images)):
            print(images[i].shape)
            subplot_number = 100+10*len(images)+(i+1)
            plt.subplot(subplot_number)
            plt.imshow(images[i])
        plt.show()
    
    
    
    for i in range(1000):
        _, loss_val = sess.run([optimizer,loss])
    
        if i%100==1:
            print("Loss at iteration %d: %f" % (i,loss_val))
            _, loss_val,adversarial_image,pred,nematode = sess.run([optimizer,loss,combined_images,the_prediction,to_add_image])
            res = np.squeeze(pred)
            average = np.mean(res, 0)
            res = res / np.sum(average)
            plt.imshow(adversarial_image[0])
            plt.show()
            print([(res[idx], net['labels'][idx]) for idx in res.argsort()[-5:][::-1]])
            show_many_images(img_4d[0],nematode,adversarial_image[0])
            plt.imsave('adversarial_goldfish.pdf',adversarial_image[0],format='pdf') # save for printing
    

    Let me know if this helps you!