[SOLVED] Isolate colored text regions in video stream

Isolate colored text regions in video stream

I want to detect colorful texts from 5-6 meters height in live video. Width of these texts are nearly 30-40 cm. I have used a few methods. For example, one is HSV to detect colors. But it is not useful since HSV value should change when the illumination of the environment changes. Also, it cannot detect colors after 30 cm. Also I looked for OCR for text recognition. In order to my research, people say that I should use color detection for this task since it is easier than OCR. Also, it is sufficient for the desired result.

All in all, how can I detect red and green texts from 5 to 6 meters away in live video stream even if this operation is applied in indoor or outdoor environment ?

Solution

This is more a suggestion for a possible way forward than a solution, but one thought would be to examine the aggregate hue of each row in the image.

Green (the top label) has a hue value of ~90, and red (the bottom label) has a hue value of ~0, so if we compute the sum of the hue values for each row in the image, we'd expect the greenest rows to have the highest hue values and the red rows to have the lowest hue values.

from scipy.misc import imread
import matplotlib.pyplot as plt
from colorsys import rgb_to_hsv
%matplotlib inline

# read in the image in RGB
img = imread('vUvMl.jpg', mode='RGB')

# find the sum of the Hue, Saturation, and Value values
# for each row in the image, top to bottom
rows = []
h_vals = []
s_vals = []
v_vals = []

for idx, row in enumerate(img):
    row_h = 0
    row_s = 0
    row_v = 0
    for pixel in row:
        r, g, b = pixel / 256
        h, s, v = rgb_to_hsv(r, g, b)
        row_h += h
        row_s += s
        row_v += v
    h_vals.append(row_h)
    s_vals.append(row_s)
    v_vals.append(row_v)
    rows.append(idx)

# plot the aggregate hue values for each row of the image
plt.scatter(rows, h_vals)
plt.title('Aggregate hue values for each row in image')
plt.show()

Result:

The plot has high values toward the left and low values toward the right, suggesting the green text is at the top of the image and the red text is at the bottom of the image.

You'd need to transpose the image matrix and find the column-wise hue values if one of the labels were on the left/right side of the image, but hopefully this can spur your ideas...