machine-learningcomputer-visionviola-jones

Haar-like features to detect objects


I was studying Viola-Jones paper for better understanding of their object detection algorithm and producing an applicable program. In the last paragraph of features' topic, authors talk about the base resolution of the detector which is 24x24, they say the exhaustive set of rectangle features is quite large, over 180,000 . Note that unlike the Haar basis, the set of rectangle features is overcomplete. Is this mean that every single rectangle feature is 24 by 24 or it simply means that we divide a given image into 24*24 blocks? 180000 is the result of finding several types of Haar-like features for every 24*24 block? And I also couldn't understand the last part which states the set of rectangle features is overcomplete. what does being overcomplete mean when we are talking about rectangle features? Thanks.


Solution

  • Every 24X24 rectangle feature gives you only one number as stated before in the same paragraph "The value of a two-rectangle feature is the difference between the sum of the pixels within two rectangular regions" and "A three-rectangle feature computes the sum within two outside rectangles subtracted from the sum in a center rectangle. Finally a four-rectangle feature computes the difference between diagonal pairs of rectangles."

    An explanation about the number 180,00 you can find in: Viola-Jones' face detection claims 180k features

    An overcomplete set means that you have some features that are a linear combination of other features. In the case of 24X24 rectangle features we can build a linear base for this space by taken all the rectangles with value 1 in one of their squares and zero in all the rest. If we calculate how many option this configuration has we get 24*24=576 which is much less than 180,000. This means that from their set of 180,000 we have some rectangles that we can get as combination of other rectangles from our set.