pythonk-meansflanncbir

Bag of Visual Words (obtained from features) for CBIR. Steps?


I'm very confused about the steps to follow to use BOVW for CBIR. I found a lot of literature about classification, machine learning and SVM but it is not quite what I'm looking for.
My problem is related to searching image similarity in a database with an image query.

My steps until now:

  1. extract features (example: ORB, BRISK, SIFT...).
  2. store all images' features to disk.
  3. read features and calculate K-means in order to obtain centroids (my vocabulary, right?)

And now I'm stuck. I found many different ways to proceed.

This is my hypothesis:

  1. for each k-means compute nearest neighbour (FLANN?)
  2. Build histogram with set of nearest neighbour

Do I have to extract a dictionary also for every single image and then indexing the images?
Why is vector quantization (step 4. and 5.) necessary?

Can you suggest me a possible way to proceed, or any article, tutorial on the topic?

NOTE: For the implementation of BOVW I cannot use OpenCV because it does not work with binary descriptors so I need to try with sklearn library.


Solution

  • Ok, this is pretty much what I was looking for:

    https://stackoverflow.com/a/8549874/8894489

    Hope that can be helpful for someone.