[SOLVED] How to find model no. in an image generated by OCR?

How to find model no. in an image generated by OCR?

(Examples are changed but the idea is the same)

I'm trying find a SRD Model No. on a product label from a live camera feed.

Here's a label example:

enter image description here

The conditions are:

Different generations of the product have different structures of the information on the label.
The SRD Model No. has a variable length, varies from generation to generation.
The SRD Model No. can contain ither only numbers or numbers and letters, varies from generation to generation.

So the question is, is there a way to find a substring of a SRD Model No. in a string generated from OCR, other then hard coding all possible variations?

Solution

Here is an example script following @Angus Comber's suggestion:

import pytesseract
import numpy as np
import cv2
from cv2 import imread, cvtColor, COLOR_BGR2HSV as HSV, inRange, getStructuringElement, resize

from pytesseract import image_to_data, Output

def extract_SRD(filename):     
    img = cv2.imread(filename)
    img_copy = img.copy()
    img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    img_blur = cv2.GaussianBlur(img_gray, (3,3),0)

    mydata = pytesseract.image_to_data(img_blur, output_type=Output.DICT, config='--psm 6')
    SRD = mydata['text'][mydata['text'].index('SRD')+2]

    return SRD

filename = 'wm3tG.png'

SRD  = extract_SRD(filename)

print(SRD)

This snippet returns: 5427G2

The important line here is SRD = mydata['text'][mydata['text'].index('SRD')+2]. This is where you define the logic used to retrieve the SRD code. In this example, I simply query the second string of characters after SRD, thus skipping the word "Model".

I would suggest fine-tuning this example to check whether a specific value in the output dictionary contains "SRD". Then you may simply look for the next string of characters:

if this next string contains "Model", then return the one after that
if not return that string of characters.