macos jpeg yuv pixelformat video-toolbox

macOS VTCompressionSession: how do I control JPEG chroma subsampling mode? (YUV 4:2:0)

For our current project, we've got a sequence of image frames being generated, which, after some processing, we need to compress in real time with some codec and send over the network. The first implementation needs to use JPEG, although apparently other, more video-focused encodings will be added later.

We've been using Apple's VideoToolbox.framework for the compression, as its JPEG encoder (kCMVideoCodecType_JPEG) is pretty fast, and it'll be the way to go for other formats, especially if hardware acceleration is supported for the codec(s) in question. (It doesn't seem to hardware-accelerate JPEG, FWIW.)

This is all working nicely, except for some typical JPEG ringing artifacts on the output frames. In theory this is no problem, there's a kVTCompressionPropertyKey_Quality property. Unfortunately, it seems tweaking this value implicitly changes the chroma subsampling mode - 0.75 and up seems to switch the encoder from YUV 4:2:0 subsampling to 4:2:2, and somewhere on the way to 1.0 it flips again to 4:4:4. For reasons outside our control, we need the frames to be encoded as 4:2:0 JPEGs, and the quality level of 0.74 is pretty bad. Plus, Apple might change their thresholds in future versions, which would suddenly break our code even if we did stick with 0.74.

Is there a way to manually select the chroma subsampling mode a VTCompressionSession uses?

Already tried: Our source frames data comes in as BRGA, so that's the pixel format we've been using for the source CVPixelBuffer objects. One thought was to do the colour space conversion ourselves, and provide pixel buffers with a kCVPixelFormatType_420YpCbCr8BiPlanarFullRange pixel format. Surely the compression session wouldn't upsample it to 422 or 444? It turns out it does. Not helpful.

Any other suggestions? It's not overly clear what properties can be set on the compression session, each frame, pixel buffers, etc. - I've dug through the framework header files, and haven't found anything obvious there, but have I missed something? Or is the only solution to switch to a different JPEG encoder?

Here's our compression session initialisation code, including the quality setting:

const void* keys[] = {
    kVTVideoEncoderSpecification_EnableHardwareAcceleratedVideoEncoder,
};
const void* values[] = {
    kCFBooleanTrue,
};
CFDictionaryRef encoder_spec = CFDictionaryCreate(
    kCFAllocatorDefault, keys, values, sizeof(keys) / sizeof(keys[0]), &kCFCopyStringDictionaryKeyCallBacks, &kCFTypeDictionaryValueCallBacks);

VTCompressionSessionRef session = NULL;
OSStatus error = VTCompressionSessionCreate(
    kCFAllocatorDefault, image_width, image_height, kCMVideoCodecType_JPEG, encoder_spec, NULL /*source buffer spec */, NULL /*allocator*/, output_callback, vscs /* session refcon*/, &session);
CFRelease(encoder_spec);

if (error != 0)
{
    // … error handling
}

int field_count = 1; // progressive
CFNumberRef field_count_val = CFNumberCreate(kCFAllocatorDefault, kCFNumberIntType, &field_count);
VTSessionSetProperty(session, kVTCompressionPropertyKey_FieldCount, field_count_val);
CFRelease(field_count_val);

VTSessionSetProperty(session, kVTCompressionPropertyKey_AllowFrameReordering, kCFBooleanFalse);

int max_frame_delay_count = 0; // encode frames in order
CFNumberRef max_frame_delay_count_val = CFNumberCreate(kCFAllocatorDefault, kCFNumberIntType, &max_frame_delay_count);
VTSessionSetProperty(session, kVTCompressionPropertyKey_MaxFrameDelayCount, max_frame_delay_count_val);
CFRelease(max_frame_delay_count_val);

float quality = 0.74f; // highest quality that defaults to YUV420
CFNumberRef quality_val = CFNumberCreate(kCFAllocatorDefault, kCFNumberFloatType, &quality);
VTSessionSetProperty(session, kVTCompressionPropertyKey_Quality, quality_val);
CFRelease(quality_val);

The pixel buffers are created as follows:

CVPixelBufferCreate(kCFAllocatorDefault, image_width, image_height, k32BGRAPixelFormat, NULL, &px_buf);

Or when using YUV420 pixel buffers:

CVPixelBufferCreate(kCFAllocatorDefault, image_width, image_height, kCVPixelFormatType_420YpCbCr8BiPlanarFullRange, NULL, &yuv_px_buf);

And each frame encoding is kicked off with this call:

OSStatus error = VTCompressionSessionEncodeFrame(
    session, img_buffer, timestamp, kCMTimeInvalid, NULL /* frame_properties */, NULL /* frame_refcon */, &flags);

Solution

I eventually made myself a little more familiar with VideoToolbox and found that you can get all the configurable properties for an encoder using the VTCopySupportedPropertyDictionaryForEncoder() function. For the JPEG encoders, this doesn't include anything that controls the output pixel format.

"JPEG encoderS"? Yes, since the introduction of Apple Silicon, there's now a hardware JPEG encoder exposed via VideoToolbox. This one behaves slightly differently than the software encoder: it seems the output here is always YUV420, regardless of quality setting. While at least somewhat more consistent, we can't use it to produce YUV444 JPEGs.

My eventual solution was to write my own GPU-based JPEG encoder using Metal Compute shaders. That way, we can control exactly what we get, and it's just as fast.