pythonmatlabimage-processingimage-enhancement

What is the best approach to enhance blacked out areas to make the text inside them readable.?


I am trying to enhance old hand drawn maps which were digitized by scanning and this process has caused some blacked out areas in the image making the text inside them very hard to read.

I tried adaptive histogram equalization and couple of other histogram based approach using MATLAB but nothing gives me the desired result. I could probably lighten the darker shades of grey and make it look a bit better using adaptive histogram equalization but it doesn't really help with the text.

Specifically, I tried adapthisteq() with different variations which is a function available in MATLAB.

Something like this:

A = adapthisteq(I,'NumTiles',X,'clipLimit',0.01,'Distribution','uniform'); 

... and also tried to change the pixel values directly by having a look at image, something like this :

I(10 > I & I > 0) = 0;   
I(30 > I & I > 10) = 10;
I(255 > I & I > 30) = 255;

Can I enhance the image and get an end result which has only black and white where the lines and text (basically all the information) turns into black (0) and the shades of grey and whiter regions turn into white (255 or 1)?

Is this even possible? If not, how close can I even get to it or what is the best solution to get as close as possible to the desired result. Any help is appreciated.

Here's what the original image looks like:

Here's what the result looks like after I tried out my solution using adaptive histogram equalization:


Solution

  • Sounds like a classic case of using adaptive thresholding. Adaptive thresholding in a general sense works by taking a look at local image pixel neighbourhoods, compute the mean intensity and seeing if a certain percentage of pixels exceed this mean intensity. If it does, we set the output to white and if not, we set this to black.

    One classic approach is to use the Bradley-Roth algorithm.

    If you'd like to see an explanation of the algorithm, you can take a look at a previous answer that I wrote up about it:

    Bradley Adaptive Thresholding -- Confused (questions)

    However, if you want the gist of it, an integral image of the grayscale version of the image is taken first. The integral image is important because it allows you to calculate the sum of pixels within a window in O(1) complexity. However, the calculation of the integral image is usually O(n^2), but you only have to do that once. With the integral image, you scan neighbourhoods of pixels of size s x s and you check to see if the average intensity is less than t% of the actual average within this s x s window then this is pixel classified as the background. If it's larger, then it's classified as being part of the foreground. This is adaptive because the thresholding is done using local pixel neighbourhoods rather than using a global threshold.

    On this post: Extract a page from a uniform background in an image, there is MATLAB code I wrote that is an implementation of the Bradley-Roth algorithm, so you're more than welcome to use it.

    However, for your image, the parameters I used to get some OK results was s = 12 and t = 25.

    After running the algorithm, I get this image:

    enter image description here

    Be advised that it isn't perfect... but you can start to see some text that you didn't see before. Specifically at the bottom, I see Lemont Library - Built 1948.... and we couldn't see that before in the original image.


    Play around with the code and the parameters, read up on the algorithm, and just try things out yourself.

    Hope this helps!