pythonadobeghostscriptpdfa

Automating pdf to pdf/A-2b conversion using ghostscript. How to overcome icc color profiling error?


I'm trying to figure out how to convert pdf's to pdfa2b format for archiving. The batch process only takes in 600-800 files at a time, and we have over half a million files. It would take an eternity if we do it one by one (probably 20 months at this rate). Any help is appreciated. Is there a way within Adobe to achieve automation, or would anyone be able to help me point in the right direction with respect to open-source scripts?

Note: Apart from adobe tools, Ive also tried using Ghostscript. I'm hitting a wall with respect to adding the proper color profiles.

PDF VALIDATION ERROR WITHOUT ADDING THE ICC PROFILE:

Device process color used but no PDF/A OutputIntent
                Has Output Intent
                Base color space name
                Outside visible page area

Ghostscript parameters in python:

gs_command = [
    r"C:\Program Files\gs\gs9.55.0\bin\gswin64c.exe",  # Full path to the Ghostscript executable
    "-dPDFA=2",
    "-dBATCH",
    "-dNOPAUSE",
    "-sDEVICE=pdfwrite",
    f"-sColorConversionStrategy={color_conversion_strategy}",
    f"-sProcessColorModel={process_color_model}",
    f"-sOutputICCProfile={icc_profile_path}",  # Path to the ICC profile
    "-sPDFACompatibilityPolicy=1",
    f"-sOutputFile={output_pdf}",
    input_pdf
]

Error:

Error: /undefined in --runpdf-- Operand stack: --nostringval-- 1
0 --nostringval-- ( **** Error: PDF interpreter encountered an error processing the file.\n) Execution stack: %interp_exit
.runexec2 --nostringval-- runpdf --nostringval-- 2
%stopped_push --nostringval-- runpdf runpdf false 1
%stopped_push 1949 1 3 %oparray_pop 1948 1 3
%oparray_pop 1933 1 3 %oparray_pop 1934 1 3
%oparray_pop runpdf Dictionary stack: --dict:753/1123(ro)(G)--
--dict:0/20(G)-- --dict:86/200(L)-- --dict:2/10(L)-- Current allocation mode is local Last OS error: Permission deniedError: Command '['D:\Projects\PDFProcessing\packages\gs10.03.1\bin\gswin64c.exe', '-dPDFA=2', '-dBATCH', '-dNOPAUSE', '-sProcessColorModel=DeviceCMYK', '-sDEVICE=pdfwrite', '-sColorConversionStrategy=CMYK', '-sProcessColorModel=DeviceCMYK', '-sOutputICCProfile=../packages/Adobe ICC Profiles (end-user)/Generic Gray Gamma 2.2 Profile.icc', '-sPDFACompatibilityPolicy=1', '-sOutputFile=../resources/output_pdfa2b.pdf', '../resources/test.pdf']' returned non-zero exit status 1. GPL Ghostscript 10.03.1: Unrecoverable error, exit code 1 Command output: None


Solution

  • PDF/A comes in many flavours but simplest requirements are to remove incompatible objects and add missing fonts, colour "intent" is a more draconian requirement to be avoided if possible.

    Thus the simplest GhostScript command is just to run with the PDF/A-2b marks and GhostScript can do that easily with a few instructions.

    Using a batch.CMD file for drag and drop or feed with a filename this would be the minimum.

    set "GSC=%ProgramFiles%\gs\gs10.03.1\bin\gswin32c.exe"
    set "in=%~dpn1.pdf"
    set "out=%~dpn1-pdfA-2b.pdf"
    "%GSC%" -sDEVICE=pdfwrite  -dPDFA=2 -dPDFACompatibilityPolicy=1  -sColorConversionStrategy=UseDeviceIndependentColor   -o"%out%" -f "%in%"
    

    Using given test file we can verify it is good enter image description here

    It gets a lot more complex if you need to add colour intents but basically follow the GhostScript documents to edit your own colour profile.PS and include the switch.

    So beware after editing any desired profile -switches include with the custom profile.ps. --permit-file-read=<icc profile name> and or --permit-file-read="profile.PS" this will avoid any mention of Permission denied