python-3.x tensorflow neural-network adversarial-machines

Get gradient value necessary to break an image

I've been experimenting with adversarial images and I read up on the fast gradient sign method from the following link https://arxiv.org/pdf/1412.6572.pdf...

The instructions explain that the necessary gradient can be calculated using backpropagation...

I've been successful at generating adversarial images but I have failed at attempting to extract the gradient necessary to create an adversarial image. I will demonstrate what I mean.

Let us assume that I have already trained my algorithm using logistic regression. I restore the model and I extract the number I wish to change into a adversarial image. In this case it is the number 2...

# construct model
logits = tf.matmul(x, W) + b
pred = tf.nn.softmax(logits)
...
...
# assign the images of number 2 to the variable
sess.run(tf.assign(x, labels_of_2))
# setup softmax
sess.run(pred)

# placeholder for target label
fake_label = tf.placeholder(tf.int32, shape=[1])
# setup the fake loss
fake_loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits,labels=fake_label)

# minimize fake loss using gradient descent,
# calculating the derivatives of the weight of the fake image will give the direction of weights necessary to change the prediction
adversarial_step = tf.train.GradientDescentOptimizer(learning_rate=FLAGS.learning_rate).minimize(fake_loss, var_list=[x])

# continue calculating the derivative until the prediction changes for all 10 images
for i in range(FLAGS.training_epochs):
    # fake label tells the training algorithm to use the weights calculated for number 6
    sess.run(adversarial_step, feed_dict={fake_label:np.array([6])})
    sess.run(pred)

This is my approach, and it works perfectly. It takes my image of number 2 and changes it only slightly so that when I run the following...

x_in = np.expand_dims(x[0], axis=0)
classification = sess.run(tf.argmax(pred, 1))
print(classification)

it will predict the number 2 as a number 6.

The issue is, I need to extract the gradient necessary to trick the neural network into thinking number 2 is 6. I need to use this gradient to create the nematode mentioned above.

I am not sure how can I extract the gradient value. I tried looking at tf.gradients but I was unable to figure out how to produce an adversarial image using this function. I implemented the following after the fake_loss variable above...

tf.gradients(fake_loss, x)

for i in range(FLAGS.training_epochs):
    # calculate gradient with weight of number 6
    gradient_value = sess.run(gradients, feed_dict={fake_label:np.array([6])})
    # update the image of number 2
    gradient_update = x+0.007*gradient_value[0]
    sess.run(tf.assign(x, gradient_update))
    sess.run(pred)

Unfortunately the prediction did not change in the way I wanted, and moreover this logic resulted in a rather blurry image.

I would appreciate an explanation as to what I need to do in order calculate and extract the gradient that will trick the neural network, so that if I were to take this gradient and apply it to my image as a nematode, it will result in a different prediction.

Solution

Why not let the Tensorflow optimizer add the gradients to your image? You can still evaluate the nematode to get the resulting gradients that were added.

I created a bit of sample code to demonstrate this with a panda image. It uses the VGG16 neural network to transform your own panda image into a "goldfish" image. Every 100 iterations it saves the image as PDF so you can print it losslessly to check if your image is still a goldfish.

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import IPython.display as ipyd
from libs import vgg16 # Download here! https://github.com/pkmital/CADL/tree/master/session-4/libs

pandaimage = plt.imread('panda.jpg')
pandaimage = vgg16.preprocess(pandaimage)
plt.imshow(pandaimage)

img_4d = np.array([pandaimage])

g = tf.get_default_graph()
input_placeholder = tf.Variable(img_4d,trainable=False)
to_add_image = tf.Variable(tf.random_normal([224,224,3], mean=0.0, stddev=0.1, dtype=tf.float32))
combined_images_not_clamped = input_placeholder+to_add_image

filledmax = tf.fill(tf.shape(combined_images_not_clamped), 1.0)
filledmin = tf.fill(tf.shape(combined_images_not_clamped), 0.0)
greater_than_one = tf.greater(combined_images_not_clamped, filledmax)

combined_images_with_max = tf.where(greater_than_one, filledmax, combined_images_not_clamped)
lower_than_zero =tf.less(combined_images_with_max, filledmin)
combined_images = tf.where(lower_than_zero, filledmin, combined_images_with_max)

net = vgg16.get_vgg_model()
tf.import_graph_def(net['graph_def'], name='vgg')
names = [op.name for op in g.get_operations()]

style_layer = 'prob:0'
the_prediction = tf.import_graph_def(
    net['graph_def'],
    name='vgg',
    input_map={'images:0': combined_images},return_elements=[style_layer])

goldfish_expected_np = np.zeros(1000)
goldfish_expected_np[1]=1.0
goldfish_expected_tf = tf.Variable(goldfish_expected_np,dtype=tf.float32,trainable=False)
loss = tf.reduce_sum(tf.square(the_prediction[0]-goldfish_expected_tf))
optimizer = tf.train.AdamOptimizer().minimize(loss)


sess = tf.InteractiveSession()
sess.run(tf.global_variables_initializer())


def show_many_images(*images):
    fig = plt.figure()
    for i in range(len(images)):
        print(images[i].shape)
        subplot_number = 100+10*len(images)+(i+1)
        plt.subplot(subplot_number)
        plt.imshow(images[i])
    plt.show()



for i in range(1000):
    _, loss_val = sess.run([optimizer,loss])

    if i%100==1:
        print("Loss at iteration %d: %f" % (i,loss_val))
        _, loss_val,adversarial_image,pred,nematode = sess.run([optimizer,loss,combined_images,the_prediction,to_add_image])
        res = np.squeeze(pred)
        average = np.mean(res, 0)
        res = res / np.sum(average)
        plt.imshow(adversarial_image[0])
        plt.show()
        print([(res[idx], net['labels'][idx]) for idx in res.argsort()[-5:][::-1]])
        show_many_images(img_4d[0],nematode,adversarial_image[0])
        plt.imsave('adversarial_goldfish.pdf',adversarial_image[0],format='pdf') # save for printing

Let me know if this helps you!