Battery Power Consumption between C/Renderscript/Neon Intrinsics -- Video filter (Edgedetection) APK

I have developed 3 C/RS/Neon-Intrinsics versions of Video Processing Algorithm using Android NDK (using C++ APIs for Renderscript). Calls to C/RS/Neon will be made to Native level on NDK side from JAVA front end. I found that for some reason Neon version consumes lot of power in comparison with C and RS versions. I used Trepn 5.0 for my power testing.

Can some one clarify me regarding the power consumption level for each of these methods C , Renderscript - GPU, Neon Intrinsics. Which one consumes most ?
What would be the Ideal power consumption level for RS codes ?, since GPU runs with less clock frequency and power consumption must be less!
Does Renderscript APIs focuses on power optimization ?

Video - 1920x1080 (20 frames)

C -- 11115.067 ms (0.80mW)
RS -- 9867.170 ms (0.43mW)
Neon Intrinsic -- 9160 ms (1.49mW)

Solution

First, Power consumption of render script code is dependent on the type of SOC, the frequency/Voltages at which the CPUs, GPUs operate etc.

Even if you look at CPUS from the same vendor, say ARM for instance A15s and A9s, A15s CPUs are more power hungry compared to the A9. Similarly, A Mali GPU4XX versus 6XX also exhibits power consumption differences for the same task. In addition, power deltas also exist between different vendors, for instance, Intel and ARM CPUs, for the doing the same task. Similarly, one would notice power differences between a QCOM Adreno GPU and say ARM Mali GPU, even if they are operating at the same frequency/voltage levels.

If you use a Nexus 5, we got a QUAD A15 CPU cranking at 2.3G speed per CPU. Renderscript pushes CPUs and GPUs to their highest clock speed. So on this device, I would expect the power consumption of RS code based on CPU/Neon or just CPU to be highest depending on the type of operations you are doing and then followed by the RS GPU code. So bottomline, on power consumption, the type of device you are using matters a lot due to the differences in SOCs they use. In the latest generation of SOCs that are out there, I expect CPUs/Neon to be more power hungry then GPU.

RS will push the CPU/GPU clock frequency to the highest possible speed. So I am not sure if one could do meaningful power optimizations here. Even if they do, those power savings will be miniscule compared to the power consumed by CPUS/GPU at their top speed.

This power consumption is such a huge problem on mobile devices, you would probably be fine from power consumption angle with your filters for processing a few frames in computational imaging space. But the moment one does renderscript in real video processing, the device gets heated up so quickly even for lower video resolutions, and then the OS system thermal managers come into play. These thermal managers reduce the overall CPU speeds, causing unreliable performance with CPU renderscript.

Responses to comments

Frequency alone is not the cause of power consumption. It is the combination of frequency and voltage. For instance, GPU running at say 200 Mhz at 1.25V, and 550 Mhz at 1.25V will likely consume the same power. Depending on how power domains are designed in the system, something like 0.9V should be enough for 200Mhz and system should in theory transision GPU power domain to a lower voltage when frequency comes down. But various SOCs have various issues so one cannot guarantee a consistent voltage and frequency transition. This could be one reason behind why GPU power could be high even for nominal loads.

So for whatever, complexities, if you are holding GPU Voltage at say something like 1.25V@600 MHz, you power consumption will be pretty high and comparable to to that of CPUs cranking at 2G@1.25V...

I tested Neon intrinsic - 5X5 convolve and they are pretty fast (3x-5x) compared to not using CPUs for the same task. Neon hardware is usually in the same power domain as CPUs (aka MPU power domain). So all CPUs are held at the voltage/frequency even when Neon hardware is working. Since Neon performs faster for the given task than CPU, I wouldn't be surprised if it consumes more power relatively than the CPU for that task. Something has to give if you are getting faster performance - it is obviously power.