I want to remove all images from a PDF file.
The page layouts should not change. All images should be replaced by empty space.
I'm putting up the answer myself, but the actual code is by courtesy of Chris Liddell, Ghostscript developer.
I used his original PostScript code and stripped off its other functions. Only the function which removes raster images remains. Other graphical page objects -- text sections, patterns and vector objects -- should remain untouched.
Copy the following code and save it as remove-images.ps
:
%!PS
% Run as:
%
% gs ..... -dFILTERIMAGE -dDELAYBIND -dWRITESYSTEMDICT \
% ..... remove-images.ps <your-input-file>
%
% derived from Chris Liddell's original 'filter-obs.ps' script
% Adapted by @pdfkungfoo (on Twitter)
currentglobal true setglobal
32 dict begin
/debugprint { systemdict /DUMPDEBUG .knownget { {print flush} if}
{pop} ifelse } bind def
/pushnulldevice {
systemdict exch .knownget not
{
//false
} if
{
gsave
matrix currentmatrix
nulldevice
setmatrix
} if
} bind def
/popnulldevice {
systemdict exch .knownget not
{
//false
} if
{
% this is hacky - some operators clear the current point
% i.e.
{ currentpoint } stopped
{ grestore }
{ grestore moveto} ifelse
} if
} bind def
/sgd {systemdict exch get def} bind def
systemdict begin
/_image /image sgd
/_imagemask /imagemask sgd
/_colorimage /colorimage sgd
/image {
(\nIMAGE\n) //debugprint exec /FILTERIMAGE //pushnulldevice exec
_image
/FILTERIMAGE //popnulldevice exec
} bind def
/imagemask
{
(\nIMAGEMASK\n) //debugprint exec
/FILTERIMAGE //pushnulldevice exec
_imagemask
/FILTERIMAGE //popnulldevice exec
} bind def
/colorimage
{
(\nCOLORIMAGE\n) //debugprint exec
/FILTERIMAGE //pushnulldevice exec
_colorimage
/FILTERIMAGE //popnulldevice exec
} bind def
end
end
.bindnow
setglobal
Now run this command:
gs -o no-more-images-in-sample.pdf \
-sDEVICE=pdfwrite \
-dFILTERIMAGE \
-dDELAYBIND \
-dWRITESYSTEMDICT \
remove-images.ps \
sample.pdf
I tested the code with the official PDF specification, and it worked. The following two screenshots show page 750 of input and output PDFs:
If you wonder why something that looks like an image is still on the output page: it is not really a raster image, but a 'pattern' in the original file, and therefor it is not removed.