Converting PDF to CMYK (with identify recognizing CMYK)

I am having much trouble to get ImageMagick's identify to, well, identify a PDF as CMYK.

Essentially, let's say I'm building this file, test.tex, with pdflatex:

\documentclass[a4paper,12pt]{article}

%% https://tex.stackexchange.com/questions/13071
\pdfcompresslevel=0

%% http://compgroups.net/comp.text.tex/Making-a-cmyk-PDF
%% ln -s /usr/share/color/icc/sRGB.icm .
% \immediate\pdfobj stream attr{/N 4} file{sRGB.icm}
% \pdfcatalog{%
% /OutputIntents [ <<
% /Type /OutputIntent
% /S/GTS_PDFA1
% /DestOutputProfile \the\pdflastobj\space 0 R
% /OutputConditionIdentifier (sRGB IEC61966-2.1)
% /Info(sRGB IEC61966-2.1)
% >> ]
% }

%% http://latex-my.blogspot.com/2010/02/cmyk-output-for-commercial-printing.html
%% https://tex.stackexchange.com/questions/9961
\usepackage[cmyk]{xcolor}

\begin{document}
Some text here...
\end{document}

If I then try to identify the resulting test.pdf file, I get it as RGB, no matter what options I've tried (at least according to the links in the source) - and yet, the colors in it would be saved as CMYK; for the source above:

$ grep -ia 'cmyk\|rgb\| k' test.pdf 
0 0 0 1 k 0 0 0 1 K
0 0 0 1 k 0 0 0 1 K
0 0 0 1 k 0 0 0 1 K
0 0 0 1 k 0 0 0 1 K
FontDirectory/CMR12 known{/CMR12 findfont dup/UniqueID known{dup
/PTEX.Fullbanner (This is pdfTeX, Version 3.1415926-1.40.11-2.2 (TeX Live 2010) kpathsea version 6.0.0)

$ identify -verbose 'test.pdf[0]'
...
  Type: Palette
  Endianess: Undefined
  Colorspace: RGB
  Depth: 16/8-bit
  Channel depth:
    red: 8-bit
    green: 8-bit
    blue: 8-bit
  Channel statistics:
    Red:
...
    Green:
...
    Blue:
...
  Histogram:
         5: (12593,11565,11822) #31312D2D2E2E rgb(49,45,46)
         4: (16448,15420,15677) #40403C3C3D3D rgb(64,60,61)
         9: (20303,19275,19532) #4F4F4B4B4C4C rgb(79,75,76)
        25: (23901,23130,23387) #5D5D5A5A5B5B rgb(93,90,91)
...

The same pretty much happens if I also uncomment that \immediate\pdfobj stream ... part; and yet, if there is only one color (black) in the document, I don't see where does identify come up with a histogram of RGB values (although, arguably, all of them close to gray) ?!

So nevermind this, then I though I'd better try to use ghostscript to convert the test.pdf into a new pdf, which would be recognized as CMYK by identify - but no luck even there:

$ gs -dNOPAUSE -dBATCH -dSAFER -sDEVICE=pdfwrite  -sOutputFile=test-gs.pdf -dUseCIEColor -sProcessColorModel=DeviceRGB -dProcessColorModel=/DeviceCMYK -sColorConversionStrategy=/CMYK test.pdf 

GPL Ghostscript 9.01 (2011-02-07)
Copyright (C) 2010 Artifex Software, Inc.  All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
Processing pages 1 through 1.
Page 1


$ identify -verbose 'test-gs.pdf[0]'
...
  Type: Grayscale
  Base type: Grayscale
  Endianess: Undefined
  Colorspace: RGB
  Depth: 16/8-bit
...

So the only thing that identify perceived as a change, is Type: Grayscale (from previous Type: Palette); but otherwise it still sees an RGB colorspace!

Along with this, note that identify is capable of correctly reporting a CMYK pdf - see CMYK poster example: fitting pdf page size to (bitmap) image size? #17843 - TeX - LaTeX - Stack Exchange for a command line example of generating such a PDF file using convert and gs. In fact, we can execute:

convert test.pdf -depth 8 -colorspace cmyk -alpha Off test-c.pdf

... and this will result with a PDF that will be identifyed as CMYK - however, the PDF will also be rasterized (default at 72 dpi).

EDIT: I have just discovered, that if I create an .odp presentation in OpenOffice, and export it to PDF; that PDF will by default be RGB, however, the following command (from ghostscript Examples | Production Monkeys):

# Color PDF to CMYK:
gs -dSAFER -dBATCH -dNOPAUSE -dNOCACHE -sDEVICE=pdfwrite \
-sColorConversionStrategy=CMYK -dProcessColorModel=/DeviceCMYK \
-sOutputFile=output.pdf input.pdf

... actually will produce a CMYK pdf, reported as such by identify (although, the black will be rich, not plain - on all four channels); however, this command will work only when the slide has an added image (apparently, it is the one triggering the color conversion?!)! Funnily, I cannot get the same effect from a pdflatex PDF.

So I guess my question can be asked two ways:

Are there any command-line conversion methods in Linux, that will convert an RGB pdf into a CMYK pdf while preserving vectors, which is recognized as such in identify (and will consequently build a correct histogram of CMYK colors)
Are there any other command-line Linux tools similar to identify, which would recognize use of CMYK colors correctly even in the original test.pdf from pdflatex (and possibly build a color histogram, based on an arbitrarily chosen PDF page, like identify is supposed to)?

Thanks in advance for any answers,
Cheers!

Some references:

adobe - Script (or some other means) to convert RGB to CMYK in PDF? - Stack Overflow
color - PDF colour model and LaTeX - TeX - LaTeX - Stack Exchange
color - Option cmyk for xcolor package does not produce a CMYK PDF - TeX - LaTeX - Stack Exchange
Making a cmyk PDF - comp.text.tex | Computer Group
colormanagement with ghostscript ? - Rhinocerus:

Is it for instance specified as "0 0 0 1 setcmykcolor"? Or possibly rather as "0 0 0 setrgbcolor"? In the latter case you would end up with a rich black for text, if DeviceRGB is remapped to a CIE-based color space in order to get RGB images color managed.

Solution

sdaau, the command you used for trying to convert your PDF to CMYK was not correct. Try this one instead:

 gs \
   -o test-cmyk.pdf \
   -sDEVICE=pdfwrite \
   -sProcessColorModel=DeviceCMYK \
   -sColorConversionStrategy=CMYK \
   -sColorConversionStrategyForImages=CMYK \
    test.pdf

Update

If color conversion does not work as desired and if you see a message like "Unable to convert color space to Gray, reverting strategy to LeaveColorUnchanged" then...

your Ghostscript probably is a newer release from the 9.x version series, and
your source PDF likely uses an embedded ICC color profile

In this case add -dOverrideICC to the command line and see if it changes the result as desired.

Update 2

To avoid JPEG artifacts appearing in the images (where there were none before), add:

-dEncodeColorImages=false

into the command line.

(This is true for almost all GS PDF->PDF processing, not just for this case. Because GS by default creates a completely new file with newly constructed objects and a new file structure when asked to produce PDF output -- it doesn't simply re-use the previous objects, as a more "dumb" PDF processor like pdftk does {pdftk has other advantages though, don't misunderstand my statement!}. GS applies JPEG compression by default -- look at the current Ps2pdf documentation and search for "ColorImageFilter" to learn about more details...)