pdfghostscriptpostscript

How can I remove all images from a PDF?


I want to remove all images from a PDF file.

The page layouts should not change. All images should be replaced by empty space.


Solution

  • I'm putting up the answer myself, but the actual code is by courtesy of Chris Liddell, Ghostscript developer.

    I used his original PostScript code and stripped off its other functions. Only the function which removes raster images remains. Other graphical page objects -- text sections, patterns and vector objects -- should remain untouched.

    Copy the following code and save it as remove-images.ps:

    %!PS
    
    % Run as:
    %
    %      gs ..... -dFILTERIMAGE -dDELAYBIND -dWRITESYSTEMDICT \
    %                 ..... remove-images.ps <your-input-file>
    %
    % derived from Chris Liddell's original 'filter-obs.ps' script
    % Adapted by @pdfkungfoo (on Twitter)
    
    currentglobal true setglobal
    
    32 dict begin
    
    /debugprint     { systemdict /DUMPDEBUG .knownget { {print flush} if} 
                    {pop} ifelse } bind def
    
    /pushnulldevice {
      systemdict exch .knownget not
      {
        //false
      } if
    
      {
        gsave
        matrix currentmatrix
        nulldevice
        setmatrix
      } if
    } bind def
    
    /popnulldevice {
      systemdict exch .knownget not
      {
        //false
      } if
      {
        % this is hacky - some operators clear the current point
        % i.e.
        { currentpoint } stopped
        { grestore }
        { grestore moveto} ifelse
      } if
    } bind def
    
    /sgd {systemdict exch get def} bind def
    
    systemdict begin
    
    /_image /image sgd
    /_imagemask /imagemask sgd
    /_colorimage /colorimage sgd
    
    /image {
       (\nIMAGE\n) //debugprint exec /FILTERIMAGE //pushnulldevice exec
      _image
      /FILTERIMAGE //popnulldevice exec
    } bind def
    
    /imagemask
    {
      (\nIMAGEMASK\n) //debugprint exec
      /FILTERIMAGE //pushnulldevice exec
      _imagemask
      /FILTERIMAGE //popnulldevice exec
    } bind def
    
    /colorimage
    {
      (\nCOLORIMAGE\n) //debugprint exec
      /FILTERIMAGE //pushnulldevice exec
      _colorimage
      /FILTERIMAGE //popnulldevice exec
    } bind def
    
    end
    end
    
    .bindnow
    
    setglobal
    

    Now run this command:

    gs -o no-more-images-in-sample.pdf \
       -sDEVICE=pdfwrite               \
       -dFILTERIMAGE                   \
       -dDELAYBIND                     \
       -dWRITESYSTEMDICT               \
        remove-images.ps               \
        sample.pdf
    

    I tested the code with the official PDF specification, and it worked. The following two screenshots show page 750 of input and output PDFs:

    If you wonder why something that looks like an image is still on the output page: it is not really a raster image, but a 'pattern' in the original file, and therefor it is not removed.