pdfpdfboxcolor-spacepdfclown

Unable to extract cmyk colorspaces from pdf


I'm trying to extract colorspace data from pdf. I have a file with Pantone and CMYK colorspaces. When I extracted the colorspaces from PDF using any pdf library (I used pdfclown, pdfbox and icePdf), the output data consists only of Pantone colorspaces data but not even single info about CMYK colorspace. I examined the file in CorelDraw software, when I clicked on the colorspace it shows the exact colorspace value like (PANTONE 3735 C, C 0 M 50 Y 50 K 0 e.t.c). How can I extract all the colorspaces present in a pdf (Pantone/CMYK) ?

using (var file = new org.pdfclown.files.File(filePath))
{
       org.pdfclown.documents.Document document = file.Document;

       foreach (org.pdfclown.documents.Page page in document.Pages)
       {
             ContentScanner cs =  new ContentScanner(page); // Wraps the page contents into the scanner.

             System.Collections.Generic.List<org.pdfclown.documents.contents.colorSpaces.ColorSpace> list = cs.Contents.ContentContext.Resources.ColorSpaces.Values.ToList();
                    for (int i = 0; i < list.Count; i++)
                    {
                            // Print list of colorspaces available
                    }
        }
}

Sample PDF Document having CMYK and PANTONE Colors

Output from 'pdfclown' showing PANTONE and its alternative colorspaces:

screen shot


Solution

  • Original answer

    Unfortunately you don't show your code. But your screen shot looks like you merely look at the ColorSpace section of the page Resources. This does not suffice in a number of ways:

    Maybe I forgot 1 or 20 other places to look for relevant colorspace settings...

    For your file, though, already the places mentioned above show that in addition to your ColorSpace resources also DeviceGray, DeviceRGB, and DeviceCMYK are used in your PDF.

    On the comments

    As you meanwhile have provided code and this code uses PDF Clown, I'll use it here, too. You can do equivalent stuff with PDF Box.

    Scan through a content stream

    A How to scan through a ContentStream ( checked the BaseDataObject of the 'Contents', it is like this ' [0] {cm [1, 0, 0, 1, 0, 0]}, 1 {gs [GS11]}'

    With PDF Clown you usually scan though a content stream using a ContentScanner. And in your code you already have a ContentScanner cs. Thus, simply call ScanForColorspaceUsage(cs) in your loop with ScanForColorspaceUsage defined like this:

    void ScanForColorspaceUsage(ContentScanner cs)
    {
        while (cs.MoveNext())
        {
            ContentObject content = cs.Current;
            if (content is CompositeObject)
            {
                ScanForColorspaceUsage(cs.ChildLevel);
            }
            else if (content is SetFillColorSpace _cs)
            {
                Console.WriteLine("Used as fill color space: {0}", _cs.Name);
            }
            else if (content is SetDeviceCMYKFillColor _k)
            {
                Console.WriteLine("Used as fill color space: DeviceCMYK");
            }
            else if (content is SetDeviceGrayFillColor _g)
            {
                Console.WriteLine("Used as fill color space: DeviceGray");
            }
            else if (content is SetDeviceRGBFillColor _rg)
            {
                Console.WriteLine("Used as fill color space: DeviceRGB");
            }
            else if (content is SetStrokeColorSpace _CS)
            {
                Console.WriteLine("Used as stroke color space: {0}", _CS.Name);
            }
            else if (content is SetDeviceCMYKStrokeColor _K)
            {
                Console.WriteLine("Used as stroke color space: DeviceCMYK");
            }
            else if (content is SetDeviceGrayStrokeColor _G)
            {
                Console.WriteLine("Used as stroke color space: DeviceGray");
            }
            else if (content is SetDeviceRGBStrokeColor _RG)
            {
                Console.WriteLine("Used as stroke color space: DeviceRGB");
            }
        }
    }
    

    All colorspaces

    B Whether the colorspace is used or not, I want to display all the Colorspaces available in the pdf and in the above document when I checked in CorelDraw it was displaying around 30-35 colorspaces as cmyk(in the second line of horizontal array of colorspaces)

    Going through your document, whenever CMYK color is used, it is used via the DeviceCMYK color space, no special ICCBased one. Thus, only one CMYK colorspace is used in your PDF.

    I don't have CorelDraw, so I cannot tell what exactly it shows you. Or do you mean individual CMYK colors?

    Learn deeper

    C Where can I learn deeper about these things to understand better?

    If by these things you mean how this all is represented in PDFs, the PDF specification might be a good reference. The most current one, ISO 32000-2, is only available for money, e.g. from the ISO store, but the older one, ISO 32000-1, is also shared by Adobe for download as PDF32000_2008.pdf.