pythontensorflowkerasgenerative-adversarial-network

Image dimension mismatch while trying to add Noise to image using Keras Sequential


To Recreate this question's ask on your system, please find the Source code and Dataset here

What I am trying?
I am trying to create a simple GAN (Generative Adversarial N/w) where I am trying to recolor Black and White images using a few ImageNet images.


What Process am I following?
I have take a few Dog images, which are stored in folder ./ImageNet/dogs/ directory. Using Python code I have created 2 more steps where I convert

  1. Dog images into 244 x 244 resolution and save in ./ImageNet/dogs_lowres/
  2. Dog Low Res. images into Grayscale and save in ./ImageNet/dogs_bnw/
  3. Feed the Low Res BnW images to GAN model and generate colored images.

Where am I Stuck?
I am stuck at understanding how the Image dimensions / shape are used. I am getting the error as such:

ValueError: `logits` and `labels` must have the same shape, received ((32, 28, 28, 3) vs (32, 224, 224)).

Here's the code for Generator and Discriminator:

# GAN model for recoloring black and white images
generator = Sequential()
generator.add(Dense(7 * 7 * 128, input_dim=100))
generator.add(Reshape((7, 7, 128)))
generator.add(Conv2DTranspose(64, kernel_size=5, strides=2, padding='same'))
generator.add(Conv2DTranspose(32, kernel_size=5, strides=2, padding='same'))
generator.add(Conv2DTranspose(3, kernel_size=5, activation='sigmoid', padding='same'))

# Discriminator model
discriminator = Sequential()
discriminator.add(Flatten(input_shape=(224, 224, 3)))
discriminator.add(Dense(1, activation='sigmoid'))

# Compile the generator model
optimizer = Adam(learning_rate=0.0002, beta_1=0.5)
generator.compile(loss='binary_crossentropy', optimizer=optimizer)

# Train the GAN to recolor images
epochs = 10000
batch_size = 32

and the training loop is as follows:

for epoch in range(epochs):
    idx = np.random.randint(0, bw_images.shape[0], batch_size)
    real_images = bw_images[idx]

    noise = np.random.normal(0, 1, (batch_size, 100))
    generated_images = generator.predict(noise)

    # noise_rs = noise.reshape(-1, 1)
    g_loss = generator.train_on_batch(noise, real_images)

    if epoch % 100 == 0:
        print(f"Epoch: {epoch}, Generator Loss: {g_loss}")

Where is the Error? I get error on line:
g_loss = generator.train_on_batch(noise, real_images)

When I check for the shape of noise and real_images objects, this is what I get:

real_images.shape
(32, 224, 224)
noise.shape
(32, 100)

Any help/suggestion is appreciated.


Solution

  • generator outputs [32 28 28 3], whereas it is getting a target of shape [32 224 224]. The target has two differences: it is greyscale rather than colour, and has larger dimensions.

    I am assuming the target supplied to the generator should be colour rather than grayscale. You can load the colour images and resize them using:

    def load_images_color(directory):
        images = []
        for filename in os.listdir(directory):
            img_path = os.path.join(directory, filename)
            img = cv2.imread(img_path)
            img = cv2.resize(img, (224, 224))  # Resize images to 224x224
            img = img.astype('float32') / 255.0  # Normalize pixel values
            images.append(img)
        return np.array(images)
    
    # Load colour images
    cl_images = load_images_color('./ImageNet/dogs')
    
    ...
    
    for epoch in range(epochs):
        ...
            
        cl_real = cl_images[idx]
        #Resize colour images to match generator output shape
        cl_real_small = []
        for im in cl_real:
            cl_real_small.append( cv2.resize(im, (28, 28)) )
        cl_real_small = np.array(cl_real_small)
        
        ...
    
        g_loss = generator.train_on_batch(noise, cl_real_small)