algorithmobject-recognitionimage-comparison

How to recognize UI elements in image?


I am trying to make an automator tool and am experimenting with a type of recording which takes screen shots and records user inputs. The idea would be for user to take a snapshot and and highlight a square on the snapshot of the "submit" button. During playback, the program would take a sceenshot of the open window, and find the coordinates of the button by searching for the snapshot. So I need an algorithm to search an image for an exact (or very close) image of the button. The algorithms I've found so far compare image likeness but cannot find it in a subimage, and algorithms for object recognition seem a bit over the top considering the "object" im trying to find will be a near perfect match. Any ideas?


Solution

  • What you need is an efficient feature extraction method. This will depend on what you're looking for, but let's assume you're looking for the Send button in this image:

    Screenshot of a web form

    One of the characteristic features of this button is that it includes a pair of parallel line segments at the top and bottom. The same applies to the two text input fields, but for the button, this offset is exactly 17 pixels.

    This is what you get if you calculate the maximum pixel values of the source image together with itself shifted vertically by 17 pixels:

    Result of 17-pixel vertical shift and maximum value calculation

    The Send button now appears as a solid horizontal line. You can detect this quite easily by thresholding the image and looking for an unbroken sequence of black pixels. Just for reference, here's what I obtained after applying a 10px horizontal motion blur and thresholding at a grey level of 128:

    enter image description here

    This process will identify candidate positions quite quickly. You can then subject these locations to stronger techniques like 2D convolution and OCR without too much loss of performance.