image-processingimagemagickimage-scanner

Cleaning scanned grayscale images with ImageMagick


I have a lots of scans of text pages (black text on white background).

My usual approach is to clean those in Gimp using the Curves dialog using a pretty simple curve with only four points: 0,0 - 63,0 - 224,255, 255,255

This makes all the greyish text pitch black plus makes the text sharper and turns most of the whitish pixels pure white.

How can I achieve the same effect in a script using ImageMagick or some other Linux tool that runs completely from the command line?

-normalize or -contrast-stretch don't work because they operate with pixel counts. I need an operator which can make the colors 0-63 (grayscale) pitch black, everything above 224 pure white and the rest should be normalized.


Solution

  • The Color Modifications page shows many color manipulation algorithms by ImageMagick.

    In this specific case, two algorithms are interesting:

    -level gives you perfect black/white pixels near the ends of the curve and a linear distribution between.

    The sigmoidal option creates a smoother curve between the extremes, which works better for color photos.

    To get a similar result like in GIMP, you can try to apply one after the other (to make text and black areas really black).

    In all cases, you will want to run -normalize first (or even -contrast-stretch to merge most of the noise) to make sure no black/white levels are wasted. Without this, the darkest color could be lighter than rgb(0,0,0) and the brightest color could be below pure white.