I agm trying to train an autoencoder on the MNIST set, where the digits are supposed to have a random translation applied to them. Using the torch transforms, I can resize and translate, but this doens't have the desired effect (the digit gets translated out of frame). Does anyone here know of a transform or some other method that would allow me to get a smaller digit randomnly translated?
I have tried to do so manually using the following code:
image = dataset[0][0][0]
background = np.zeros((56,56))
topLeft = (random.randint(0,27), random.randint(0,27))
background[topLeft[0]:topLeft[0]+28, topLeft[1]:topLeft[1]+28] = image[0][0]
but I am unable to do this transformation on the actual MNIST set. Any help would be greatly appreciated.
i have done it with Affine transform
from PIL import Image
from pathlib import Path
import matplotlib.pyplot as plt
import torch
from torchvision.transforms import v2
plt.rcParams["savefig.bbox"] = 'tight'
torch.manual_seed(0)
# you can download the assets and the
# helpers from https://github.com/pytorch/vision/tree/main/gallery/
from helpers import plot
orig_img = Image.open(Path('gallery/assets/astronaut.jpg'))
affine_transfomer = v2.RandomAffine(degrees=0,translate=(0.1, 0.3),scale=(0.5,0.5))
affine_imgs = [affine_transfomer(orig_img) for _ in range(4)]
plot([orig_img] + affine_imgs)
On top of this you can also use 56x56 resize method
here you can see more details, you can play with translate
and scale
params to shift the image from center
I hope this helps