I am trying to recreate the data preprocessing on the ImageNet data set done in the original publication "Deep Residual Learning for Image Recognition". As said in their paper in section 3.4: "Our implementation for ImageNet follows the practice in [21, 41]. The image is resized with its shorter side randomly sampled in[256,480]for scale augmentation [41].A 224×224 crop is randomly sampled from an image or its horizontal flip, with the per-pixel mean subtracted [21]. The standard color augmentation in [21] is used."
I have figured out the parts of randomly cropping the original image or the horizontal flip with a crop size of 224x224. The other two parts I have not. The other two parts are The image is resized with its shorter side randomly sampled in[256,480] for scale augmentation and the The standard color augmentation in [21] is used.
For the first one, I can't find a "random resize" function in torch transforms. The second, where its referencing [21], is (according to [21]) a "perform PCA on the set of RGB pixel values throughout theImageNet training set". Please refer to ImageNet Classification with Deep Convolutional Neural Networks in section "Data Augmentation" for the full explanation.
How would I recreate this type of preprocessing?
The first one needs 3 combined transforms, RandomChoice
, Resize
and RandomCrop
.
transforms.Compose([transforms.RandomChoice([transforms.Resize(256),
transforms.Resize(480)]),
transforms.RandomCrop(224)
])
For the second one this is what you're looking for but officially Pytorch (and literally everybody else) simply uses this.
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
If you think that's too simple, the standard Tensorflow pre-processing is just
x /= 127.5
x -= 1.