python image-processing opencv ocr simplecv

Accurate binary image classification

I'm trying to extract letters from a game board for a project. Currently, I can detect the game board, segment it into the individual squares and extract images of every square.

The input I'm getting is like this (these are individual letters):

enter image description here

At first, I was counting the number of black pixels per image and using that as a way of identifying the different letters, which worked somewhat well for controlled input images. The problem I have, though, is that I can't make this work for images that differ slightly from these.

I have around 5 samples of each letter to work with for training, which should be good enough.

Does anybody know what would be a good algorithm to use for this?

My ideas were (after normalizing the image):

Counting the difference between an image and every letter image to see which one produces the least amount of error. This won't work for large datasets, though.
Detecting corners and comparing relative locations.
???

Any help would be appreciated!

Solution

I think this is some sort of Supervised Learning. You need to do some feature extraction on the images and then do your classification on the basis of the feature vector you've computed for each image.

Feature Extraction

On the first sight, that Feature Extraction part looks like a good scenario for Hu-Moments. Just calculate the image moments, then compute cv::HuMoments from these. Then you have a 7 dimensional real valued feature space (one feature vector per image). Alternatively, you could omit this step and use each pixel value as seperate feature. I think the suggestion in this answer goes in this direction, but adds a PCA compression to reduce the dimensionality of the feature space.

Classification

As for the classification part, you can use almost any classification algorithm you like. You could use an SVM for each letter (binary yes-no classification), you could use a NaiveBayes (what is the maximal likely letter), or you could use a k-NearestNeighbor (kNN, minimum spatial distance in feature space) approach, e.g. flann.

Especially for distance-based classifiers (e.g. kNN) you should consider a normalization of your feature space (e.g. scale all dimension values to a certain range for euclidean distance, or use things like mahalanobis distance). This is to avoid overrepresenting features with large value differences in the classification process.

Evaluation

Of course you need training data, that is images' feature vectors given the correct letter. And a process, to evaluate your process, e.g. cross validation.

In this case, you might also want to have a look at template matching. In this case you would convolute the candidate image with the available patterns in your training set. High values in the output image indicate a good probability that the pattern is located at that position.