machine-learningdeep-learningcleverhansadversarial-machines

Extracting original image format after adversarial attack with Cleverhans


Suppose I load up the MNIST dataset with Cleverhans and attack an image with FGM. Any image I load via the Cleverhans MNIST dataset already has its pixel values constrained to [0, 1], and the same is true after I attack the image (suppose I clip the image to [0, 1]). If I want to see the attack in this case, I would just multiply all the pixel values by 255 and create the adversarial image.

In this scenario, the original MNIST image with pixel values in [0, 255] has instead been modified to have pixel values in [0, 1] by dividing all values by 255. In order to get back the original "image properties" I just multiply again by 255.

Is there a way (in Cleverhans, or in general) to extract the original image properties when this preprocessing step (in MNIST's case, dividing by 255) is more complicated? For example, I am thinking of VGG16 where an ImageNet image is resized while its aspect ratio is preserved, and the process of bringing the image back to its original size is complicated and is unique for each image.

Is it possible to add this preprocessing step as a step in the model to directly get the noise on the original image? I imagine that this is likely not the case since not all preprocessing steps are differentiable?

Does this mean I am unable to view the noise as applied on the original image if the preprocessing step is too complicated?


Solution

  • That's correct, if your pipeline uses a prep-rocessing stage that is:

    However, you can use attacks that do not compute gradients directly, like SPSA to operate in the original domain directly, even if the pre-processing stage is non-differentiable: https://github.com/tensorflow/cleverhans/blob/master/cleverhans/attacks/spsa.py