machine-learningdeep-learningautoencodersemisupervised-learning

I want to train a Variational Autoencoder with both labeled samples and unlabeled samples


The images are like this: fake image generated with no label real image with label

I have 12000 fake images, that I generated based on bright spots on an image with no label. I have 1200 real images, that have annotations and true labels. I want the output to be labeled generated images. I want to know how should I proceed.

I want my Variational Autoencoder to generate real images or close to real with labels. I thought of two options: The first option is to use the fake images and train the 12000 images and test with the 1200 real images, since as you can see from the example, some of them match. The second option is to downsample the 12000 and train semisupervisely, with unlabeled and labeled samples.


Solution

  • Autoencoders generally do not form labels but rather attempt to recreate what you train on. You would need a grouping mechanism to create 'labels' for your data. To do this, simply perform the following:

    1. Train autoencoder on all/labeled data (perform your splits as normal). Really depends on what you are trying to obtain. I think for your case you really want to use labeled data. And 'generate' new images from your fake images that are close to your real images. As such use labeled data only for training.

    2. Take the encoder output for all your labeled data and train a grouping algorithm such as kmeans or another network (perform your train/validation split here as well).

    2b) You could also run all your data through the encoding and do kmeans here. Maybe there is an additional group?

    1. Label your unlabeled data but passing it through encoder-> grouping