I have a question about the last part of the SiftDescriptorExtractor job,
I'm doing the following:
SiftDescriptorExtractor extractor;
Mat descriptors_object;
extractor.compute( img_object, keypoints_object, descriptors_object );
Now I want to check the elements of a descriptors_object Mat object:
std::cout<< descriptors_object.row(1) << std::endl;
output looks like:
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 3, 0, 0, 0, 0, 0, 0, 32, 15, 0, 0, 0, 0, 0, 0, 73, 33, 11, 0, 0, 0, 0, 0, 0, 5, 114, 1, 0, 0, 0, 0, 51, 154, 20, 0, 0, 0, 0, 0, 154, 154, 1, 2, 1, 0, 0, 0, 154, 148, 18, 1, 0, 0, 0, 0, 0, 2, 154, 61, 0, 0, 0, 0, 5, 60, 154, 30, 0, 0, 0, 0, 34, 70, 6, 15, 3, 2, 1, 0, 14, 16, 2, 0, 0, 0, 0, 0, 0, 0, 154, 84, 0, 0, 0, 0, 0, 0, 154, 64, 0, 0, 0, 0, 0, 0, 6, 6, 1, 0, 1, 0, 0, 0]
But in Lowe paper it is stated that:
Therefore, we reduce the influence of large gradient magnitudes by thresholding the values in the unit feature vector to each be no larger than 0.2, and then renormalizing to unit length. This means that matching the magnitudes for large gradients is no longer as important, and that the distribution of orientations has greater emphasis. The value of 0.2 was determined experimentally using images containing differing illuminations for the same 3D objects.
So the numbers from the feature vector should be no larger than 0.2 value.
The question is, how these values have been converted in a Mat object?
So the numbers from the feature vector should be no larger than 0.2 value.
No. The paper says that SIFT descriptors are:
0.2
as a threshold (i.e. loop over the normalized values and truncate when appropriate)So in theory any SIFT descriptor component is between [0, 1]
, even though in practice the effective range observed is smaller (see below).
The question is, how these values have been converted in a Mat object?
They are converted from floating-point values to unsigned char
-s.
Here's the related section from OpenCV modules/nonfree/src/sift.cpp
calcSIFTDescriptor
method:
float nrm2 = 0;
len = d*d*n;
for( k = 0; k < len; k++ )
nrm2 += dst[k]*dst[k];
float thr = std::sqrt(nrm2)*SIFT_DESCR_MAG_THR;
for( i = 0, nrm2 = 0; i < k; i++ )
{
float val = std::min(dst[i], thr);
dst[i] = val;
nrm2 += val*val;
}
nrm2 = SIFT_INT_DESCR_FCTR/std::max(std::sqrt(nrm2), FLT_EPSILON);
for( k = 0; k < len; k++ )
{
dst[k] = saturate_cast<uchar>(dst[k]*nrm2);
}
With:
static const float SIFT_INT_DESCR_FCTR = 512.f;
This is because classical SIFT implementations quantize the normalized floating point values into unsigned char
integer through a 512 multiplying factor, which is equivalent to consider that any SIFT component varies between [0, 1/2]
, and thus avoid to loose precision trying to encode the full [0, 1]
range.