pythonpdfcolorscmykspotcolor

calculate cmyk, spot coverage on PDF with python


I don't find any free or open source libraries to calculate CMYK and spot color on pdf. I would be grateful if someone could guide me in the right direction as to what I should do to access color channels and calculate the percentage of color used ( C,M,Y,K and spot, Export each separately ) with Python.

Point: Actually, I don't have a problem with extract C,M,Y,K because I can easily extract it from the image, but the problem is that when I add spot colors, it convert it into cmyk again.

That's why I'm looking for it in PDF.

Thanks


Solution

  • I hope this will be useful for those who have a similar problem in the future.

    Dependencies : Ghostscript - Pillow

    How does it work ? Ghostscript will separated colors ( C,M,Y,K, Spots ) and save each one as .tiff and Pillow calculate the percentage of color ( In fact, the file saved by Ghostscript is in grayscale mode and has only one color channel. 0 to 255 ) used on each file.

    Point: Before that, make sure you have installed Ghostscript

    from PIL import Image
    from django.http import HttpResponse
    import os , fnmatch
    
    def pdf_color_splitter():
    
        # Where the photos of separated colors are placed => 
        path = 'image_inputs/'
        if not os.path.exists(path):
            os.makedirs(path)
        
        # now we run ghostscript command for separated colors and save them as tiff files =>
        os.system(f'gs -sDEVICE=tiffsep -o {path}c.tiff  cmyk_calculate/2021.pdf')
    
        # get all .tiff
        FILES = fnmatch.filter(os.listdir(path), '*.tiff')
    
        # calculate colors coverage each separately 
        splited_colors = []
        for f in FILES:
    
            O_FILE = Image.open(path+f)
    
            image_sizew,image_sizeh = O_FILE.size # get width,height
            count=image_sizeh*image_sizew
    
            val=0 # Collects colored pixels
    
            for i in range(0, image_sizew):
                for j in range(1, image_sizeh):
                    pixVal = O_FILE.getpixel((i, j))
                    if pixVal != 255 and type(pixVal) != tuple: # no white pixels
                    val+= 100 - (pixVal//2.55) # Pay attention to the point below this code
    
            resp = {'name':f.split('.')[0].replace('c(','').replace(')',''),'coverage':val/count}
            split_colors.append(resp)
            os.remove(path+f) # remove .tiff file in the end
            
        return splited_colors
    

    Look to this code

    val+= 100 - (pixVal//2.55)
    

    So what was this for?

    We want a number in the range of 0 to 100 because we are working in CMYK mode and subtract the answer from 100 because the photo is in Grayscale mode (actually to get the correct color density).