[SOLVED] SIFT descriptors values: OpenCV vs VLFeat

SIFT descriptors values: OpenCV vs VLFeat

I'm trying to compare the SIFT implementation of OpenCV and VLFeat.

I noticed that the descriptors value for VLFeat are integers, such as:

0 0 0 0 0 0 0 0 0 0 0 17 45 20 26 0 1 ...

While for OpenCV:

0.0391555 0 0 0.0998274 0.235747 0 0 0.0276871 0.156622 ...

Notice that these are descriptors for 2 different images.

I have two questions:

Why they have two different values?
If I'm going to need the OpenCV representation for k-means using VLFeat (and then VLAD encoding) do I need to change these values?

Solution

Disclaimer, I'm not an expert in OpenCV or VLFeat but I think that I know answers.

VLFeat can generate both integer and float descriptors. To generate integer descriptors use vl_sift function and in order to generate float descriptors use vl_dsift function with FloatDescriptors parameter.

VLFeat probably uses integer descriptors for performance reasons. Calculations using integers are generally faster than using floats. However, it could be at the expense of precision. Nonetheless, in the case of the computer vision a smaller precision may not be so crucial. In the description of the integer k-means algorithm you can even read "While this is limiting for some application, it works well for clustering image descriptors, where very high precision is usually unnecessary".

As to k-means algorithm. There is a version (vl_ikmeans) for integers descriptors and the version (vl_kmeans)] for float descriptors. With OpenCV simply use the latter.