machine-learningdeep-learningautoencoderimage-classificationsemisupervised-learning

How to classify images with Variational Autoencoder


I have trained an autoencoder in both labeled images (1200) and unlabeled images (4000) and I have both models saved separately (vae_fake_img and vae_real_img). So I was wondering what to do next. I know Variational Autoencoders are not useful for a classification task but feature extraction seems like a good try. So here are my attempts:

  1. Labeled my unlabeled data using k-means clustering from the labeled images latent space.
  2. My supervisor suggested training the unlabeled images on the VAE, then visualize the latent space with t-SNE, then K-means clustering, then MLP for final prediction.
  3. I want to train a Conditional VAE to create more labeled samples and retrain the VAE and use the reconstruction (64,64,3) output and using the last three fully connected (FC) layers of VGGNet16 architecture for final classification as done in this paper Encoder as feature extraction paper.

I have tried so many methods for my thesis and I really need to achieve high accuracy if I want to get a job in my current internship. So any suggestion or guidance is highly appreciated. I've read so many Autoencoder papers but the architecture for classification is not fully explained (or Im not understanding properly), I want to know which part of the VAE holds more information for multiclassification as I believe that the latent space of the encoder has more useful information than the decoder reconstruction. I want to know which part of the autoencoder has better feature extraction for a final classification.


Solution

  • in case of Autoencoders yoh don't need labels for reconstructing input data. So I think these approaches might make slight improvements:

    don't forget Batch normalization and drop out layers p.s., the most meaningful layer of an AE is the latent space.