To be sure, I tested my code in the following two dev settings:
Development OS: Windows 7 32-bit
Phone: Nexus 5
Phone OS version: Android 4.4 and Android 4.4.1
SDK bundle: adt-bundle-windows-x86-20131030
Build-tool version: 19
SDK tool version: 22.3
Platform tool version: 19
and
Development OS: Windows 7 32-bit
Phone: Nexus 5
Phone OS version: Android 4.4 and Android 4.4.1
SDK bundle: adt-bundle-windows-x86-20130729
Build-tool version: 18.1
SDK tool version: 22.2.1
Platform tool version: 18.0.1
The code is also very simple as follows
#pragma rs_fp_relaxed
uchar4 __attribute__((kernel)) sample(uchar4 in, uint32_t x, uint32_t y){
const float4 fin = convert_float4(in);
float tmp = pow(2.f, 2.f); // very slow on GPU
fin.x = tmp;
return convert_uchar4(fin);
}
The code will be automatically run on GPU. However, the problem I met was, the pow() function is very slow. If I run this script with a 1600*1067 image, it will take 1927ms on GPU. If I use adb to force the code to run on CPU, it'll only take 10ms to 12ms. If I comment out the pow() function, it'll run fast in both CPU and GPU. I also tried the alternative powr() and pown() function and the result was the same. And I also tried to include:
#include "rs_cl.rsh"
and the result was the same.
I'm wondering if this is the expected behavior. Thank you in advance.
two things:
pow() and similar functions are generally very slow on GPUs due to precision requirements. you can use native_powr() if you have less strict precision requirements, which is often dramatically faster.
if you comment out the pow(), you might not be doing anything except a memcpy. the compiler will optimize out a lot in those cases, but yes, pow() is very slow.