I am struggling with the compilation of Eigen library for iPhone 4 which has an ARM processor with armv7 instruction set. Everything works fine so far when I specify the preprocessor define EIGEN_DONT_VECTORIZE. But due to some performance issues I would like to use armv7 optimised code.
Regardless which compiler I use LLVM-GCC 4.2 or LLVM CLang 2.0, I always run into compilation errors. I figured out (or better think so), that LLVM-GCC 4.2 is the only way to get access to these ARM-NEON specific instructions.
When I do not set EIGEN_DONT_VECTORIZE (and provide -mfloat-abi=softfp -mfpu=neon to gcc) I get the following gcc compiler error:
src/m3CoreLib/Eigen/src/Core/arch/NEON/PacketMath.h:89: error: expected unqualified-id before '__ extension__'
I have read about issues using the "old" gcc 4.2 and the recommendation to use a newer version of gcc. I am not sure but I believe this is not an option because of app store approval. Is there anything else I can do to get it compiled for iPhone.? Anybody out there who solved this?
After fiddling around with different compiler settings hours and hours I found myself a satisfying solution and came to following conclusion.
There is a surprisingly huge difference between debug and release settings regarding Eigen's template library approach: Release settings with usual optimisation flags enabled let the application run 20 to 40 times faster than debug. I have never seen such a difference before in any language, from my experience it is usually 1.5 - 3.
Although I still cannot force vectorisation i.e. code compiles only with EIGEN_DONT_VECTORIZE defined, the resulting performance fits my needs now.