I'm using VTDecompressionSession to decode an H.264 stream over a network. I need to copy the YUV buffer from the given image buffer. I've verified that the given imageBuffer's typeID is equal to CVPixelBufferGetTypeID()
.
But whenever I try to retrieve the base address of the buffer or any of the planes, they always come back NULL. The OSStatus that iOS is passing is 0, so my assumption is that nothing is wrong here. Perhaps I don't know how to expect to extract the data. Can anyone help?
void decompressionCallback(void * CM_NULLABLE decompressionOutputRefCon,
void * CM_NULLABLE sourceFrameRefCon,
OSStatus status,
VTDecodeInfoFlags infoFlags,
CM_NULLABLE CVImageBufferRef imageBuffer,
CMTime presentationTimeStamp,
CMTime presentationDuration )
{
CFShow(imageBuffer);
size_t dataSize = CVPixelBufferGetDataSize(imageBuffer);
void * decodedBuffer = CVPixelBufferGetBaseAddress(imageBuffer);
memcpy(pYUVBuffer, decodedBuffer, dataSize);
}
Edit: Also here's a dump of the CVImageBufferRef object. One thing that seems fishy is that I would expect there to be 3 planes (Y,U, and V). But there are only two planes. My expectation here would be to use CVPixelBufferGetBaseAddressOfPlane
to extract each plane of data. I'm implementing this to remove dependency on a separate software codec, so I need to extract each plane this way as the rest of my rendering pipeline requires it.
{type = immutable dict, count = 5, entries => 0 : {contents = "PixelFormatDescription"} = {type = immutable dict, count = 10, entries => 0 : {contents = "Planes"} = {type = mutable-small, count = 2, values = ( 0 : {type = mutable dict, count = 3, entries => 0 : {contents = "FillExtendedPixelsCallback"} = {length = 24, capacity = 24, bytes = 0x000000000000000030139783010000000000000000000000} 1 : {contents = "BitsPerBlock"} = {value = +8, type = kCFNumberSInt32Type} 2 : {contents = "BlackBlock"} = {length = 1, capacity = 1, bytes = 0x10} }
1 : {type = mutable dict, count = 5, entries => 2 : {contents = "HorizontalSubsampling"} = {value = +2, type = kCFNumberSInt32Type} 3 : {contents = "BlackBlock"} = {length = 2, capacity = 2, bytes = 0x8080} 4 : {contents = "BitsPerBlock"} = {value = +16, type = kCFNumberSInt32Type} 5 : {contents = "VerticalSubsampling"} = {value = +2, type = kCFNumberSInt32Type} 6 : {contents = "FillExtendedPixelsCallback"} = {length = 24, capacity = 24, bytes = 0x0000000000000000ac119783010000000000000000000000} }
)} 2 : {contents = "IOSurfaceOpenGLESFBOCompatibility"} = {value = true} 3 : {contents = "ContainsYCbCr"} = {value = true} 4 : {contents = "IOSurfaceOpenGLESTextureCompatibility"} = {value = true} 5 : {contents = "ComponentRange"} = {contents = "VideoRange"} 6 : {contents = "PixelFormat"} = {value = +875704438, type = kCFNumberSInt32Type} 7 : {contents = "IOSurfaceCoreAnimationCompatibility"} = {value = true} 9 : {contents = "ContainsAlpha"} = {value = false} 10 : {contents = "ContainsRGB"} = {value = false} 11 : {contents = "OpenGLESCompatibility"} = {value = true} }
2 : {contents = "ExtendedPixelsRight"} = {value = +8, type = kCFNumberSInt32Type} 3 : {contents = "ExtendedPixelsTop"} = {value = +0, type = kCFNumberSInt32Type} 4 : {contents = "ExtendedPixelsLeft"} = {value = +0, type = kCFNumberSInt32Type} 5 : {contents = "ExtendedPixelsBottom"} = {value = +0, type = kCFNumberSInt32Type} } propagatedAttachments={type = mutable dict, count = 7, entries => 0 : {contents = "CVImageBufferChromaLocationTopField"} = Left 1 : {contents = "CVImageBufferYCbCrMatrix"} = {contents = "ITU_R_601_4"} 2 : {contents = "ColorInfoGuessedBy"} = {contents = "VideoToolbox"} 5 : {contents = "CVImageBufferColorPrimaries"} = SMPTE_C 8 : {contents = "CVImageBufferTransferFunction"} = {contents = "ITU_R_709_2"} 10 : {contents = "CVImageBufferChromaLocationBottomField"} = Left 12 : {contents = "CVFieldCount"} = {value = +1, type = kCFNumberSInt32Type} } nonPropagatedAttachments={type = mutable dict, count = 0, entries => }
So your format is kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange = '420v'
and two planes makes sense for 4:2:0 YUV data because the first plane is a full size Y single channel bitmap and and the second is a half width, half height UV double channel bitmap.
You are right, for planar data you ought to call CVPixelBufferGetBaseAddressOfPlane
, although you should be able to use CVPixelBufferGetBaseAddress
, interpreting its result as CVPlanarPixelBufferInfo_YCbCrBiPlanar
, so maybe the problem is that you're not calling CVPixelBufferLockBaseAddress
before CVPixelBufferGetBaseAddress*
nor CVPixelBufferUnlockBaseAddress
afterwards.
From here you can efficiently display the 2 YUV planes using Metal or OpenGL by writing some fun YUV->RGB shader code.