androidvideoandroid-mediacodecandroid-5.1.1-lollipopgrafika

Video Lag and FPS drop on Android Lollipop on touching screen


I am using MediaCodec to play 1080p@60fps video. This is on freescale SabreSD platform with Android Lollipop 5.1.

Initially because of BufferQueue Synchronous Mode, the FPS was way below 60.I could now manage to play at 70FPS by changing the BufferQueue to Asynchronous as in JB.

Now the next challenge I am facing is the video lags and FPS drops drastically to 40 when I start interacting with the screen (pulling down notification bar , pressing volume button etc).

So I ran rafika MultiSurfaceActivity and Record GL, I can see all the test play smoothly when no screen is touched or disturbed, but as soon as I start scrolling the notification bar from top and continue that for long time, the fps gets reduced to 35-40FPS.

I have confirmed the same test on Kitkat 4.4.2 and JB 4.2.2 and they seems to work fine.

Same behaviour when playing MP4 from Gallery. The video gets stuck and lags a lot when we start playing with Notification bar

Can anyone explain what has change from Kitkat to Lollipop which can cause this issue (VSync, Triple Buffering ?).


Solution

  • Regurgitating a bit from the Grafika issue tracker:

    The bouncing ball is software-rendered, so anything that soaks up CPU time is going to make it slow down. On devices with medium-grade CPUs and big displays (e.g. Nexus 10) it never gets close to 60fps. So a slowdown while you are playing with the nav bar doesn't surprise me, but if it continues to be slow even after you stop playing with the nav bar, then that's a little weird.

    Video playback should be less affected, as that does less with the CPU.

    Investigation into such problems usually begins by using systrace to capture traces in "good" and "bad" states, and comparing the two.

    The key point of BufferQueue "async mode" is to allow frames to drop if the consumer can't keep up with the producer. It's primarily meant for SurfaceTexture, where producer and consumer are in the same app, potentially on the same thread, so having the producer stall waiting for the consumer could cause the program to hang. I'm not sure what you mean by needing it to exceed 60fps, but I would guess you're throwing frames at the display faster than it can render them... so you're not really increasing the frame rate, you're just using the BufferQueue to drop the frames instead of using Choreographer to decide when you need to drop them yourself.

    In any event, I left Google back in June 2014, well before Lollipop was completed. If something works correctly on KitKat but weirdly on Lollipop, I'm afraid I can't provide much insight. If you can reproduce the behavior easily, it might be worth capturing a video that demonstrates the problem (point a second smart phone at the device exhibiting the problem, so they can see how you manipulate the device) and filing a bug on http://b.android.com/.


    Some traces uploaded by the OP:

    Looking at the kitkat trace, something weird is going on in SurfaceFlinger. The main thread is sitting in postFrameBuffer for a very long time (23-32ms). It eventually wakes up, and the CPU row suggests it was waiting on activity from a "galcore daemon", which I'm not familiar with (seems particular to Vivante GPU).

    The lollipop traces only show the CPU rows, as if the capture were done without the necessary tags. I don't believe the systrace capture command changed significantly between kitkat and lollipop, so I'm puzzled as to why the user-space-initiated logging would vanish but the kernel thread scheduling stuff would remain. Make sure you have sched gfx view specified.


    The newer lollipop traces only have about a second of good data. When you see "Did Not Finish" it means a "start" record had no matching "end" record. You can increase the systrace logging buffer size with the -b flag. I think there's enough there though.

    Looking at the /system/bin/surfaceflinger row you can see that, in the "good" trace, postFrameBuffer usually finishes in about 16ms, but it's still waiting on galcore. Zoom in on 388ms (use WASD keys). At 388.196ms, on the CPU 2 row, you can see galcore do something. Right after it completes, the thin line at the top of the surfaceflinger row changes from light grey (sleeping) to green (running). At 388.548ms, again on CPU 2, galcore runs again, and right after that on the surfaceflinger row you see queueBuffer start to execute.

    The "bad" trace looks identical. For example, you can see two galcore executions at 101.146ms and 101.666ms, with what appear to be similar effects on the surfaceflinger row. The key difference is the time spent in postFrameBuffer, which is around 16ms for "good" and around 30ms for "bad".

    So this doesn't appear to be a behavioral shift; rather, things are taking longer and deadlines are being missed.

    As far as I can tell, SurfaceFlinger is being held up by galcore daemon. This is true in both "good" and "bad" cases. To see what the timing should look like you can run systrace on a Nexus device, or compare to traces from other devices (e.g. the one in this case study or this SO question). If you zoom in you can see doComposition executing in a few milliseconds, and postFrameBuffer finishing in a few tenths of a millisecond.

    Summing up: you don't have good and bad, you have bad and worse. :-) I don't know what galcore is, but you'll likely need to have a conversation with the GPU OEM.