I have a question about mean and standard deviation in image augmentation.
Are the two parameters recommended to be filled in?
If so, how could I know the number? Do I have to iterate through the data, also each channel of image, before the train to get it?
import albumentations as A
train_transform = A.Compose(
[
A.Resize(height=IMAGE_HEIGHT, width=IMAGE_WIDTH),
A.ColorJitter(brightness=0.3, hue=0.3, p=0.3),
A.Rotate(limit=5, p=1.0),
# A.HorizontalFlip(p=0.3),
# A.VerticalFlip(p=0.2),
A.Normalize(
mean=[0.0, 0.0, 0.0],# <-----------this parameter
std=[1.0, 1.0, 1.0],# <-----------this parameter
max_pixel_value=255.0,
),
ToTensorV2(),
],
)
Yes it is strongly recommended to normalize your images in most of the cases, obviously you will face some situations that does not require normalization. The reason is to keep the values in a certain range. The output of the network, even if the network is 'big', is strongly influenced by the input data range. If you keep your input range out of control, your predictions will drastically change from one to another. Thus, the gradient would be out of control too and might make your training unefficient. I invite you to read this and that answers to have more details about the 'why' behind normalization and have a deeper understanding of the behaviours.
It is quite common to normalize images with imagenet mean & standard deviation : mean = [0.485, 0.456, 0.406]
, std = [0.229, 0.224, 0.225]
. Of course you could also consider, if your dataset is enough realistic, in a production context, to use its own mean and std instead of imagenet's.
Finally keep in mind those values since, once your model will be trained, you will still need to normalize any new image to achieve a good accuracy with your future inferences.