pytorchgenerative-adversarial-networkdcgan

How to feed a image to Generator in GAN Pytorch


So, I'm training a DCGAN model in pytorch on celeba dataset (people). And here is the architecture of the generator:

Generator(
  (main): Sequential(
    (0): ConvTranspose2d(100, 512, kernel_size=(4, 4), stride=(1, 1), bias=False)
    (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace=True)
    (3): ConvTranspose2d(512, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (5): ReLU(inplace=True)
    (6): ConvTranspose2d(256, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (7): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (8): ReLU(inplace=True)
    (9): ConvTranspose2d(128, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (10): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (11): ReLU(inplace=True)
    (12): ConvTranspose2d(64, 3, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (13): Tanh()
  )
)

So after training, I want to check what generator outputs if I feed an occluded image like this: enter image description here (size: 64X64)

But as u might have guessed that the image has 3 channels and my generator accepts a latent vector of 100 channels at the starting, so what is the correct way to feed this image to the generator and check the output. (I'm expecting that the generator tries to generate only the occluded part of the image). If you want a reference code then try this demo file of pytorch. I have modified this file according to my own needs, so for referring, this will do the trick.


Solution

  • You just can't do that. As you said, your network expects 100 dimensional input which is normally sampled from standard normal distribution:

    enter image description here

    So the generator's job is to take this random vector and generate 3x64x64 image that is indistinguishable from real images. Input is a random 100 dimensional vector sampled from standard normal distribution. I don't see any way to input your image into the current network without modifying the architecture and retraining the new model. If you want to try a new model, you can change input to occluded images, apply some conv. / linear layers to reduce the dimensions to 100 then keep the rest of the network same. This way network will try to learn to generate images not from latent vector but from the feature vector extracted from occluded images. It may or may not work.

    EDIT I've decided to give it a go and see if network can learn with this type of conditioned input vectors instead of latent vectors. I've used the tutorial example you've linked and added a couple of changes. First a new network for receiving input and reducing it to 100 dimensions:

    class ImageTransformer(nn.Module):
        def __init__(self):
            super(ImageTransformer, self).__init__()
            self.main = nn.Sequential(
                nn.Conv2d(3, 1, 4, 2, 1, bias=False),
                nn.LeakyReLU(0.2, inplace=True)
            )            
    
            self.linear = nn.Linear(32*32, 100)
    
        def forward(self, input):
            out = self.main(input).view(input.shape[0], -1)
            return self.linear(out).view(-1, 100, 1, 1)
    

    Just a simple convolution layer + relu + linear layer to map to 100 dimensions at the output. Note that you can try a much better network here as a better feature extractor, I just wanted to make a simple test.

    fixed_input = next(iter(dataloader))[0][0:64, :, : ,:]
    fixed_input[:, :, 20:44, 20:44] = torch.tensor(np.zeros((24,24), dtype = np.float32))
    fixed_input = fixed_input.to(device)
    

    This is how I modify the tensor to add a black patch over the input. Just sampled a batch to create a fixed input to track the process as it was done in the tutorial with a random vector.

    # Create the generator
    netG = Generator().to(device)
    netD = Discriminator().to(device)
    netT = ImageTransformer().to(device)
    
    # Apply the weights_init function to randomly initialize all weights
    #  to mean=0, stdev=0.2.
    netG.apply(weights_init)
    netD.apply(weights_init)
    netT.apply(weights_init)
    
    # Print the model
    print(netG)
    print(netD)
    print(netT)
    

    Most of the steps are same, just created an instance of the new transformer network. Then finally, training loop is slightly modified where generator is not fed random vectors but it is given outputs of the new transformer network.

    img_list = []
    G_losses = []
    D_losses = []
    iters = 0
    
    for epoch in range(num_epochs):
        for i, data in enumerate(dataloader, 0):
    
            ############################
            # (1) Update D network: maximize log(D(x)) + log(1 - D(G(z)))
            ###########################
            ## Train with all-real batch
            netD.zero_grad()
            transformed = data[0].detach().clone()
            transformed[:, :, 20:44, 20:44] = torch.tensor(np.zeros((24,24), dtype = np.float32))
            transformed = transformed.to(device)
            real_cpu = data[0].to(device)
            b_size = real_cpu.size(0)
            label = torch.full((b_size,), real_label, dtype=torch.float, device=device)
            output = netD(real_cpu).view(-1)
            errD_real = criterion(output, label)
            errD_real.backward()
            D_x = output.mean().item()
    
            ## Train with all-fake batch
            fake = netT(transformed)
            fake = netG(fake)
            label.fill_(fake_label)
            output = netD(fake.detach()).view(-1)
            errD_fake = criterion(output, label)
            errD_fake.backward()
            D_G_z1 = output.mean().item()
            errD = errD_real + errD_fake
            optimizerD.step()
    
            ############################
            # (2) Update G network: maximize log(D(G(z)))
            ###########################
            netG.zero_grad()
            label.fill_(real_label)
            output = netD(fake).view(-1)
            errG = criterion(output, label)
            errG.backward()
            D_G_z2 = output.mean().item()
            optimizerG.step()
    
            # Output training stats
            if i % 50 == 0:
                print('[%d/%d][%d/%d]\tLoss_D: %.4f\tLoss_G: %.4f\tD(x): %.4f\tD(G(z)): %.4f / %.4f'
                      % (epoch, num_epochs, i, len(dataloader),
                         errD.item(), errG.item(), D_x, D_G_z1, D_G_z2))
    
            # Save Losses for plotting later
            G_losses.append(errG.item())
            D_losses.append(errD.item())
    
            # Check how the generator is doing by saving G's output on fixed_noise
            if (iters % 500 == 0) or ((epoch == num_epochs-1) and (i == len(dataloader)-1)):
                with torch.no_grad():
                    fake = netT(fixed_input)
                    fake = netG(fake).detach().cpu()
                img_list.append(vutils.make_grid(fake, padding=2, normalize=True))
    
            iters += 1
    

    Training was somewhat okay in terms of loss reductions etc. Finally this is what I got after 5 epochs training:

    inputs

    outputs

    So what does this result tell us? Since the generator's inputs were not randomly taken from a normal distribution, generator wasn't able to learn the distribution of faces to create varying range of output faces. And since the input is a conditioned feature vector, output images' range is limited. So in summary, random inputs are required for the generator even though it learned to remove patches :)