machine-learningimage-processingcomputer-visioncluster-analysisunsupervised-learning

Clustering images based on their similarity


I am facing a problem of image clustering based on their similarity, without knowing the number of clusters. Ideally i would like to achieve something that resembles this http://cs231n.github.io/assets/cnnvis/tsne.jpeg (http://cs231n.github.io/understanding-cnn/ this picture is a result of convolution neural network and it represents groups it learnt)

Because I am not interested in their classification (I don't know classes), i am mostly interested in their 'visual' properties: colours, shapes, gradients etc. I have found number of articles suggesting algorithms like DBSCAN, t-SNE or even k-means but is there some better solution? There were suggestions of using HOG transformation but to be honest, no idea how to stitch it all together.

So, to summaries, how can I segregate (on 2D plane, into groups, folders, whatever) images based on their colours and shape properties?


Solution

  • t-SNE is actually perfect for the thing you are trying to do.

    t-Distributed Stochastic Neighbor Embedding (t-SNE) is a (prize-winning) technique for dimensionality reduction that is particularly well suited for the visualization of high-dimensional datasets.

    You can read more about it here.

    As always sklearn has a very user-friendly TSNE object to quickly try it out.

    I hope this helps...