I am trying to write an android video conference app by using codec of openmax. When I have coped my way with OpenMAX IL for avc decoding, found it a big latency from sending empty buffer command to fill buffer done callback. My case is dealing with a 4-cif h.264 elementary stream without B-Slices. My calling sequence of omx is:
The log outputs indicate that there is 8-frame latency, from empty buffer #9 command is send to message FILL_BUFFER_DONE #1 arrived. I have test it on samsung-note2 and htc-one-x and some other mobile phones, all have a big decoding latency.
This latency is large for a video conference app's acceptance. Any one can help me to shorten this latency?
The log outputs goes:
I/java:TestKdavc(19867): video test started
I/java:TestKdavc(19867): set video source: /sdcard/DCIM/vidrev.dat
I/testkdavc/testkdavc.cpp(19867): [start@331] frame dimesion: 704 x 576
I/OMXClient(19867): Using client-side OMX mux.
D/avc/omxctrl.cpp(19867): [InitNode@671] mComType = 1
D/avc/omxctrl.cpp(19867): [createNode@516] tid = 1074982704
D/avc/omxctrl.cpp(19867): [createNode@519] m_node = 4136a16c
D/avc/omxctrl.cpp(19867): [getVideoPortInfo@290] nPorts = 2, iport = 0, oport = 1
D/avc/omxctrl.cpp(19867): [createNode@549] mComType = 1, port = 0, info.nBufferCountActual = 5, info.nBufferSize = 50688, info.nBufferCountMin = 5
D/avc/omxctrl.cpp(19867): [createNode@582] mComType = 1, port = 1, info.nBufferCountActual = 2, info.nBufferSize = 608256, info.nBufferCountMin = 2
D/avc/omxctrl.cpp(19867): [allocatePortBuffers@321] mComType = 1, portIndex = 0, def.nBufferCountActual = 5, def.nBufferSize = 608256, def.nBufferCountMin = 5, buffersize = 608256
D/avc/omxctrl.cpp(19867): [allocatePortBuffers@340] before useBuffer
D/avc/omxctrl.cpp(19867): [allocatePortBuffers@340] before useBuffer
D/avc/omxctrl.cpp(19867): [allocatePortBuffers@340] before useBuffer
D/avc/omxctrl.cpp(19867): [allocatePortBuffers@340] before useBuffer
D/avc/omxctrl.cpp(19867): [allocatePortBuffers@340] before useBuffer
D/avc/omxctrl.cpp(19867): [allocatePortBuffers@321] mComType = 1, portIndex = 1, def.nBufferCountActual = 2, def.nBufferSize = 608256, def.nBufferCountMin = 2, buffersize = 608256
D/avc/omxctrl.cpp(19867): [allocatePortBuffers@336] before allocateBufferWithBackup
D/avc/omxctrl.cpp(19867): [allocatePortBuffers@336] before allocateBufferWithBackup
D/avc/omxctrl.cpp(19867): [onMessage@96] mComType: 1, OMX_CommandStateSet, state: 2
I/avc/omxctrl.cpp(19867): [onMessage@131] message type: EVENT
D/avc/omxctrl.cpp(19867): [onMessage@96] mComType: 1, OMX_CommandStateSet, state: 3
I/avc/omxctrl.cpp(19867): [onMessage@131] message type: EVENT
D/avc/omxctrl.cpp(19867): [createNode@626] mComType = 1, m_vecOutputBuffers.size() = 2, err = 0
I/testkdavc/testkdavc.cpp(19867): [start@365] found AVC/H264 decoder: OMX.SEC.AVC.Decoder, color format: OMX_COLOR_FormatYUV420Planar
I/testkdavc/testkdavc.cpp(19867): [start@376] start feed
I/avc/omxctrl.cpp(19867): [PushData@489] empty buffer #1
I/avc/omxctrl.cpp(19867): [PushData@489] empty buffer #2
I/avc/omxctrl.cpp(19867): [PushData@489] empty buffer #3
I/avc/omxctrl.cpp(19867): [PushData@489] empty buffer #4
I/avc/omxctrl.cpp(19867): [PushData@489] empty buffer #5
I/avc/omxctrl.cpp(19867): [fillBufferThreadEntry@785] fill buffer #1
I/avc/omxctrl.cpp(19867): [fillBufferThreadEntry@785] fill buffer #2
I/avc/omxctrl.cpp(19867): [handleBufferMessage@159] message type: EMPTY_BUFFER_DONE #1
I/avc/omxctrl.cpp(19867): [PushData@489] empty buffer #6
I/avc/omxctrl.cpp(19867): [handleBufferMessage@159] message type: EMPTY_BUFFER_DONE #2
I/avc/omxctrl.cpp(19867): [PushData@489] empty buffer #7
I/avc/omxctrl.cpp(19867): [handleBufferMessage@159] message type: EMPTY_BUFFER_DONE #3
I/avc/omxctrl.cpp(19867): [PushData@489] empty buffer #8
I/avc/omxctrl.cpp(19867): [handleBufferMessage@159] message type: EMPTY_BUFFER_DONE #4
I/avc/omxctrl.cpp(19867): [PushData@489] empty buffer #9
I/avc/omxctrl.cpp(19867): [handleBufferMessage@189] message type: FILL_BUFFER_DONE #1
I/testkdavc/testkdavc.cpp(19867): [OnFrame@150] get frame #1 of 704 x 576
I/avc/omxctrl.cpp(19867): [handleBufferMessage@159] message type: EMPTY_BUFFER_DONE #5
I/avc/omxctrl.cpp(19867): [fillBufferThreadEntry@785] fill buffer #3
I/avc/omxctrl.cpp(19867): [PushData@489] empty buffer #10
I/avc/omxctrl.cpp(19867): [handleBufferMessage@189] message type: FILL_BUFFER_DONE #2
I/testkdavc/testkdavc.cpp(19867): [OnFrame@150] get frame #2 of 704 x 576
I/avc/omxctrl.cpp(19867): [handleBufferMessage@159] message type: EMPTY_BUFFER_DONE #6
I/testkdavc/testkdavc.cpp(19867): [start@426] retry put data
I/avc/omxctrl.cpp(19867): [handleBufferMessage@189] message type: FILL_BUFFER_DONE #3
I/testkdavc/testkdavc.cpp(19867): [OnFrame@150] get frame #3 of 704 x 576
I would not care relative latency but rather measure latency in time units and then try to identify where latency is generated. It might be that (I saw such implementation in some platform vendor code) there is some threshold on output buffer queue and FBD is not sent immediately. It also might be a characteristic of internal h264 decoding unit implementation.
I do not have code of Tegra (note), but Exynos implementation is available by default from aosp. Assuming that you are able to build/upload *.so I would start from doing some measurements in I-frame decoding mode. In Exynos (as also in other cases) it is triggered by thumbnail mode but be aware that quite frequently integrators set google's decoder as default for thumbnail creation - in that case you must get rid of this or run thumbnail creation for high profile (google's codec will fail since it is supporting main and baseline afair and then will continue with vendor's one).
You can also set IFrameMode for regular playback look into decode/omx Exynos implementation for master branch for reference, i.e. you need to send V4L2_CID_MPEG_MFC51_VIDEO_I_FRAME_DECODING in codec config phase.
IMHO latency for I frame mode will be some kind of asymptote for regular decoding (without some advanced optimizations). In the next step make some timing measurements in lower layers including kernel. All results in comparison with regular decoding will give you complete picture if it is possible, where and how to optimize latency.