I am trying to build my own Bubble sheet OMR engine with Python 3.8 and OpenCV. Despite several days of debugging, I can't beat that error which occurs when I am cropping the bubbles individualy:
Traceback (most recent call last):
File "C:\Users\hsolatges\AppData\Local\Programs\Python\Python38\lib\site-packages\numpy\lib\shape_base.py", line 867, in split
len(indices_or_sections)
TypeError: object of type 'int' has no len()
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "c:/autoQuiz/omr.py", line 80, in <module>
anwsers_array = tb.parse_anwsers(anwsers_field, MCQ.get('QUESTIONS'), MCQ.get('CHOICES'))
bubbles = get_bubbles(padded_img, questions, choices)
File "c:\autoQuiz\toolbox.py", line 81, in get_bubbles
rows = np.vsplit(img, questions)
File "<__array_function__ internals>", line 5, in vsplit
File "C:\Users\hsolatges\AppData\Local\Programs\Python\Python38\lib\site-packages\numpy\lib\shape_base.py", line 991, in vsplit
return split(ary, indices_or_sections, 0)
File "<__array_function__ internals>", line 5, in split
File "C:\Users\hsolatges\AppData\Local\Programs\Python\Python38\lib\site-packages\numpy\lib\shape_base.py", line 872, in split
raise ValueError(
ValueError: array split does not result in an equal division
As the size of the bubbles region is arbitrary, I tried to edge-pad it so that its width and height are both a multiple of the number of questions / number of choices (A B C D E). Unfortunately, it doesn't work properly. Except for tests/omr-1.jpg, the others fail.
Here an excerpt of the code:
def to_next_multiple(n,b):
return int(ceil(n/b) * b)
def pad_image(img, questions, choices):
w, h = img.shape[:2]
w_final, h_final, = to_next_multiple(w, choices), to_next_multiple(h, questions)
w_padding, h_padding = max(0, w-w_final), max(0, h-h_final)
padded_img = np.pad(img, ((0, h_padding), (0, w_padding)), 'edge')
return padded_img
def get_bubbles(img, questions, choices):
bubbles = []
rows = np.vsplit(img, questions)
for row in rows:
cells = np.hsplit(row, choices)
bubbles.append(cells)
return bubbles
def parse_anwsers(img, questions, choices):
# Otsu's thresholding after Gaussian filtering
gray = cv.cvtColor(img, cv.COLOR_BGR2GRAY)
blurred = cv.GaussianBlur(gray, (5,5), 0)
retValue, thresh = cv.threshold(blurred, 0, 255, cv.THRESH_BINARY+cv.THRESH_OTSU)
padded_img = pad_image(thresh, questions, choices)
w, h = padded_img.shape[:2]
# Debugging
print(f'width: {w}, w_division: {w/choices}')
print(f'height: {h}, h_division: {h/questions}')
bubbles = get_bubbles(padded_img, questions, choices)
answers_array = bubbles
return answers_array
The repo can be found here: https://github.com/hsolatges/autoQuiz
How can I consistently get a ready to be np.vsplit/np.hsplit image ?
So the issues came from the following lines:
w_padding, h_padding = max(0, w-w_final), max(0, h-h_final)
padded_img = np.pad(img, ((0, h_padding), (0, w_padding)), 'edge')
Becomes:
w_padding, h_padding = w_final-w, h_final-h
padded_img = np.pad(img, ((0, w_padding), (0, h_padding)), 'edge')
I was doing crappy math and missfigured the numpy axis system. I thought padding on axis #0 was padding more rows and padding on axis #1 was padding more column; though it was the other way.