c++algorithmimage-processingopencv

Image Processing: Algorithm Improvement for 'Coca-Cola Can' Recognition


One of the most interesting projects I've worked on in the past couple of years was a project about image processing. The goal was to develop a system to be able to recognize Coca-Cola 'cans' (note that I'm stressing the word 'cans', you'll see why in a minute). You can see a sample below, with the can recognized in the green rectangle with scale and rotation.

Template matching

Some constraints on the project:

So you could end up with tricky things like this (which in this case had my algorithm totally fail):

Total fail

I did this project a while ago, and had a lot of fun doing it, and I had a decent implementation. Here are some details about my implementation:

Language: Done in C++ using OpenCV library.

Pre-processing: For the image pre-processing, i.e. transforming the image into a more raw form to give to the algorithm, I used 2 methods:

  1. Changing color domain from RGB to HSV and filtering based on "red" hue, saturation above a certain threshold to avoid orange-like colors, and filtering of low value to avoid dark tones. The end result was a binary black and white image, where all white pixels would represent the pixels that match this threshold. Obviously there is still a lot of crap in the image, but this reduces the number of dimensions you have to work with. Binarized image
  2. Noise filtering using median filtering (taking the median pixel value of all neighbors and replace the pixel by this value) to reduce noise.
  3. Using Canny Edge Detection Filter to get the contours of all items after 2 precedent steps. Contour detection

Algorithm: The algorithm itself I chose for this task was taken from this awesome book on feature extraction and called Generalized Hough Transform (pretty different from the regular Hough Transform). It basically says a few things:

In the end, you end up with a heat map of the votes, for example here all the pixels of the contour of the can will vote for its gravitational center, so you'll have a lot of votes in the same pixel corresponding to the center, and will see a peak in the heat map as below:

GHT

Once you have that, a simple threshold-based heuristic can give you the location of the center pixel, from which you can derive the scale and rotation and then plot your little rectangle around it (final scale and rotation factor will obviously be relative to your original template). In theory at least...

Results: Now, while this approach worked in the basic cases, it was severely lacking in some areas:

Can you help me improve my specific algorithm, using exclusively OpenCV features, to resolve the four specific issues mentioned?

I hope some people will also learn something out of it as well, after all I think not only people who ask questions should learn. :)


Solution

  • An alternative approach would be to extract features (keypoints) using the scale-invariant feature transform (SIFT) or Speeded Up Robust Features (SURF).

    You can find a nice OpenCV code example in Java, C++, and Python on this page: Features2D + Homography to find a known object

    Both algorithms are invariant to scaling and rotation. Since they work with features, you can also handle occlusion (as long as enough keypoints are visible).

    Enter image description here

    Image source: tutorial example

    The processing takes a few hundred ms for SIFT, SURF is bit faster, but it not suitable for real-time applications. ORB uses FAST which is weaker regarding rotation invariance.

    The original papers