I have some questions, probably stupid, about the implementation of the adaptive thresholding by Bradley. I have read paper about it http://people.scs.carleton.ca:8008/~roth/iit-publications-iti/docs/gerh-50002.pdf and I am a bit confused. Mainly about this statement:
if ((in[i,j]*count) ≤ (sum*(100−t)/100)) then
Let's assume that we have this input:
width, i
[0] [1] [2]
+---+---+---+
height [0] | 1 | 2 | 2 |
j +---+---+---+
[1] | 3 | 4 | 3 |
+---+---+---+
[2] | 5 | 3 | 2 |
+---+---+---+
and let's say that:
s = 2
s/2 = 1
t = 15
i = 1
j = 1 (we are at the center pixel)
So that means we have a window 3x3, right? Then:
x1 = 0, x2 = 2, y1 = 0, y2 = 2
What is count then? If it is number of pixels in the window, why it is 2*2=4, instead of 3*3=9 according to the algorithm? Further, why is the original value of the pixel multiplied by the count?
The paper says that the value is compared to the average value of surrounding pixels, why it isn't
in[i,j] <= (sum/count) * ((100 - t) / 100)
then?
Can somebody please explain this to me? It is probably very stupid question but I can't figure it out.
Before we start, let's present the pseudocode of the algorithm written in their paper:
procedure AdaptiveThreshold(in,out,w,h)
1: for i = 0 to w do
2: sum ← 0
3: for j = 0 to h do
4: sum ← sum+in[i, j]
5: if i = 0 then
6: intImg[i, j] ← sum
7: else
8: intImg[i, j] ← intImg[i−1, j] +sum
9: end if
10: end for
11: end for
12: for i = 0 to w do
13: for j = 0 to h do
14: x1 ← i−s/2 {border checking is not shown}
15: x2 ← i+s/2
16: y1 ← j −s/2
17: y2 ← j +s/2
18: count ← (x2−x1)×(y2−y1)
19: sum ← intImg[x2,y2]−intImg[x2,y1−1]−intImg[x1−1,y2] +intImg[x1−1,y1−1]
20: if (in[i, j]×count) ≤ (sum×(100−t)/100) then
21: out[i, j] ← 0
22: else
23: out[i, j] ← 255
24: end if
25: end for
26: end for
intImg
is the integral image of the input image to threshold, assuming grayscale.
I've implemented this algorithm with success, so let's talk about your doubts.
What is
count
then? If it is number of pixels in the window, why it is 2*2=4, instead of 3*3=9 according to the algorithm?
There is an underlying assumption in the paper that they don't talk about. The value of s
needs to be odd, and the windowing should be:
x1 = i - floor(s/2)
x2 = i + floor(s/2)
y1 = j - floor(s/2)
y2 = j + floor(s/2)
count
is certainly the total number of pixels in the window, but you also need to make sure that you don't go out of bounds. What you have there should certainly be a 3 x 3 window and so s = 3
, not 2. Now, if s = 3
, but if we were to choose i = 0, j = 0
, we will have x
and y
values that are negative. We can't have this and so the total number of valid pixels within this 3 x 3 window centred at i = 0, j = 0
is 4, and so count = 4
. For windows that are within the bounds of the image, then count
would be 9.
Further, why is the original value of the pixel multiplied by the count? The paper says that the value is compared to the average value of surrounding pixels, why it isn't:
in[i,j] <= (sum/count) * ((100 - t) / 100)
then?
The condition you're looking at is at line 20 of the algorithm:
20: (in[i, j]×count) ≤ (sum×(100−t)/100)
The reason why we take a look at in[i,j]*count
is because we assume that in[i,j]
is the average intensity within the s x s
window. Therefore, if we examined a s x s
window and added up all of the intensities, this is equal to in[i,j] x count
. The algorithm is quite ingenious. Basically, we compare the assumed average intensity (in[i,j] x count
) within the s x s
window and if this is less than t%
of the actual average within this s x s
window (sum x ((100-t)/100)
), then the output is set to black. If it is larger, than the output is set to white. However, you have eloquently stated that it should be this instead:
in[i,j] <= (sum/count) * ((100 - t) / 100)
This is essentially the same as line 20, but you divided both sides of the equation by count
, so it's still the same expression. I would say that this explicitly states what I talked about above. The multiplication by count
is certainly confusing, and so what you have written makes more sense.
Therefore, you're just seeing it a different way, and that's totally fine! So to answer your question, what you have stated is certainly correct and is equivalent to the expression seen in the actual algorithm.
Hope this helps!