pdfencodingghostscriptpdf-manipulation

wrong encode when update pdf meta data using ghostscript and pdfmark


I have a base pdf file, and want to update the title into Chinese (UTF-8) using ghostscript and pdfmark, command like below

gs -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -sOutputFile=result.pdf base.pdf pdfmarks

And the pdfmarks file (encoding is UTF-8 without BOM) is below

[ /Title (敏捷开发)
/Author (Larry Cai)
/Producer (xdvipdfmx (0.7.8))
/DOCINFO pdfmark

The command is successfully executed, while when I check the properties of the result.pdf

The title is changed to æŁ‘æ“·å¼•å‘

Please give me hints how to solve this, are there any parameters in gs command or pdfmark?


Solution

  • The PDF Reference states that the Title entry in the document info dictionary is of type 'text string'. Text strings are defined as using either PDFDocEncoding or UTF-16BE with a Byte Order Mark (see page 158 of the 1.7 PDF Reference Manual).

    So you cannot specify a Title using UTF-8 without a BOM.

    I would imagine that if you replace the Title string with a string defining the content using UTF-16BE with a BOM then it will work properly. I would suggest you use a hex string rather than a regular PostScript string to specify the data, simply for ease of use.