[SOLVED] Adding random positional variance to the MNIST dataset

Adding random positional variance to the MNIST dataset

I agm trying to train an autoencoder on the MNIST set, where the digits are supposed to have a random translation applied to them. Using the torch transforms, I can resize and translate, but this doens't have the desired effect (the digit gets translated out of frame). Does anyone here know of a transform or some other method that would allow me to get a smaller digit randomnly translated?

I have tried to do so manually using the following code:

image = dataset[0][0][0]
background = np.zeros((56,56))
topLeft = (random.randint(0,27), random.randint(0,27))
background[topLeft[0]:topLeft[0]+28, topLeft[1]:topLeft[1]+28] = image[0][0]

but I am unable to do this transformation on the actual MNIST set. Any help would be greatly appreciated.

Solution

i have done it with Affine transform

from PIL import Image
from pathlib import Path
import matplotlib.pyplot as plt

import torch
from torchvision.transforms import v2

plt.rcParams["savefig.bbox"] = 'tight'


torch.manual_seed(0)

# you can download the assets and the
# helpers from https://github.com/pytorch/vision/tree/main/gallery/
from helpers import plot
orig_img = Image.open(Path('gallery/assets/astronaut.jpg'))

affine_transfomer = v2.RandomAffine(degrees=0,translate=(0.1, 0.3),scale=(0.5,0.5))
affine_imgs = [affine_transfomer(orig_img) for _ in range(4)]
plot([orig_img] + affine_imgs)

On top of this you can also use 56x56 resize method
here you can see more details, you can play with translate and scale params to shift the image from center

I hope this helps