metadata-extractor

Can this library detect if JPG is in RGB or CMYK format?


thanks for the metadata-extractor library, it's really helpful. It gives me all information I need except whether a JPG is RGB or CMYK format. Is the information there and am I just not seeing it, or is this library not returning this attribute?

Thanks


Solution

  • From this document on the Java ImageIO package:

    https://docs.oracle.com/javase/7/docs/api/javax/imageio/metadata/doc-files/jpeg_metadata.html

    When reading, the contents of the stream are interpreted by the usual JPEG conventions, as follows:

    • If a JFIF APP0 marker segment is present, the colorspace is known to be either grayscale or YCbCr. If an APP2 marker segment containing an embedded ICC profile is also present, then the YCbCr is converted to RGB according to the formulas given in the JFIF spec, and the ICC profile is assumed to refer to the resulting RGB space.

    • If an Adobe APP14 marker segment is present, the colorspace is determined by consulting the transform flag. The transform flag takes one of three values:

      • 2 - The image is encoded as YCCK (implicitly converted from CMYK on encoding).
      • 1 - The image is encoded as YCbCr (implicitly converted from RGB on encoding).
      • 0 - Unknown. 3-channel images are assumed to be RGB, 4-channel images are assumed to be CMYK.
    • If neither marker segment is present, the following procedure is followed: Single-channel images are assumed to be grayscale, and 2-channel images are assumed to be grayscale with an alpha channel. For 3- and 4-channel images, the component ids are consulted. If these values are 1-3 for a 3-channel image, then the image is assumed to be YCbCr. Subject to the availability of the optional color space support described above, if these values are 1-4 for a 4-channel image, then the image is assumed to be YCbCrA. If these values are > 4, they are checked against the ASCII codes for 'R', 'G', 'B', 'A', 'C', 'c'. These can encode the following colorspaces:

      • RGB
      • RGBA
      • YCC (as 'Y','C','c'), assumed to be PhotoYCC
      • YCCA (as 'Y','C','c','A'), assumed to be PhotoYCCA

    Otherwise, 3-channel subsampled images are assumed to be YCbCr, 3-channel non-subsampled images are assumed to be RGB, 4-channel subsampled images are assumed to be YCCK, and 4-channel, non-subsampled images are assumed to be CMYK.

    All other images are declared uninterpretable.

    Metadata Extractor doesn't perform these conversions, however the above approach gives a tested example of the steps you can take to determine the colour format.