imagemachine-learningcomputer-visioncaffeconv-neural-network

How does mean image subtraction work?


To preface, I am new to the field of ML/CV, and am currently in the process of training a custom conv net using Caffe.

I am interested in mean image subtraction to achieve basic data normalization on my training images. However, I am confused as to how mean subtraction works and exactly what benefits it has.

I know that a "mean image" can be calculated from the training set, which is then subtracted from the training, validation, and testing sets to make the network less sensitive to differing background and lightening conditions.

Does this involve calculating the mean of all pixels in each image, and averaging these? Or, is the value from each pixel coordinate averaged across all images in the set (i.e. average values of pixels at location (1,1) for all images)? This may require that all images are the same size...

Also, for colored images (3-channels), is the value for each channel individually averaged?

Any clarity would be appreciated.


Solution

  • In deep learning, there are in fact different practices as to how to subtract the mean image.

    Subtract mean image

    The first way is to subtract mean image as @lejlot described. But there is an issue if your dataset images are not the same size. You need to make sure all dataset images are in the same size before using this method (e.g., resize original image and crop patch of same size from original image). It is used in original ResNet paper, see reference here.

    Subtract the per-channel mean

    The second way is to subtract per-channel mean from the original image, which is more popular. In this way, you do not need to resize or crop the original image. You can just calculate the per-channel mean from the training set. This is used widely in deep learning, e.g, Caffe: here and here. Keras: here. PyTorch: here. (PyTorch also divide the per-channel value by standard deviation.)