pythontensorflowdeep-learninggenerative-adversarial-networkstylegan

How does Mapping Network in StyleGAN work?


I am learning StyleGAN architecture and I got confused about the purpose of the Mapping Network. In the original paper it says:

Our mapping network consists of 8 fully-connected layers, and the dimensionality of all input and output activations— including z and w — is 512.

And there is no information about this network being trained in any way.

Like, wouldn’t it just generate some nonsense values?

I've tried creating a network like that (but with a smaller shape (16,)):

import tensorflow as tf
import numpy as np

model = tf.keras.models.Sequential()
model.add(tf.keras.Input(shape=(16)))

for i in range(7):
  model.add(tf.keras.layers.Dense(16, activation='relu'))

model.compile()

and then evaluated it on some random values:

g = tf.random.Generator.from_seed(34)
model(
    g.normal(shape=(16, 16))
)

And I am getting some random outputs like:

array([[0.        , 0.01045225, 0.        , 0.        , 0.02217731,
        0.00940356, 0.02321716, 0.00556996, 0.        , 0.        ,
        0.        , 0.03117323, 0.        , 0.        , 0.00734158,
        0.        ],
       [0.03159791, 0.05680077, 0.        , 0.        , 0.        ,
        0.        , 0.05907414, 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.03110216, 0.04647615, 0.        ,
        0.04566741],
       .
       .  # More similar vectors goes there
       .   
       [0.        , 0.01229661, 0.00056016, 0.        , 0.03534952,
        0.02654905, 0.03212402, 0.        , 0.        , 0.        ,
        0.        , 0.0913604 , 0.        , 0.        , 0.        ,
        0.        ]], dtype=float32)>

What am I missing? Is there any information on the Internet about training Mapping Network? Any math explanation? Got really confused :(


Solution

  • As I understand the mapping network is not trained separately. It it part of generator network and adjusts weights based on gradients just like other parts of the network.

    In their stylegan generator code implementation it written the Generator is composed of two sub networks one mapping and another synthesis. In stylegan3 generator source it is much easier to see. The output of mapping is passed to synthesis network which generates image.

    class Generator(torch.nn.Module):
        ...
        def forward(self, z, ...):
            ws = self.mapping(z, ...)
            img = self.synthesis(ws, ...)
            return img
    

    The diagram below shows mapping network from stylegan 2019 paper. Section 2 describes about mapping network.

    Generator Diagram with Mapping Layer

    enter image description here

    Mapping layer is represented with f in the paper that takes noise vector z initialized from normal distribution and maps to intermediate latent representation w. It is implemented with 8 layer MLP. Stylegan mapping network implementation has MLP layers set to 8.

    In section 4 they mention,

    a common goal is a latent space that consists of linear subspaces, each of which controls one factor of variation. However, the sampling probability of each combination of factors in Z needs to match the corresponding density in the training data.

    A major benefit of our generator architecture is that the intermediate latent space W does not have to support sampling according to any fixed distribution.

    So, z and w have same dimensions but w is more disentangled than z. Finding a w from intermediate latent space W for an image allows specific image editing.

    From Encoder for Editing paper,

    enter image description here

    In stylegan2-ada paper with other changes they found mapping network depth of 2 better than 8. In stylegan3 mapping layer code implementation default number of layers in mapping is set to 2.

    References