image-processingcomputer-visionimage-recognitioncbir

Why SIFT descriptors are scale invariant?


My understanding: SIFT descriptor uses the histogram of orientation gradient calculated from 16x16 neighbourhood pixels. 16x16 area in a large image can be a very small area, e.g. 1/10 of one hair on a cat's paw, when you resize the target image into a small size, 16x16 neighbourhood around the same key point can be a large of part of the image, e.g. the paw of the cat It doesn't make sense to me to compare the original image with the resized image using SIFT descriptor, Can any one tell me what's wrong with my understanding ?


Solution

  • This is a rough description, but should give you an understanding of the approach.

    One of the stages that SIFT uses is to create a pyramid of scales of the image. It will scale down and smooth using a low pass filter.

    The feature detector then works by finding features that have a peak response not only in the image space, but in scale space too. This means that it finds the scale of the image which the feature will produce the highest response.

    Then, the descriptor is calculated in that scale. So when you use a smaller/larger version, it should still find the same scale for the feature.