c++assemblyintelsimdintel-mic

Manually control Intel MIC SIMD operations by intrinsics or instructions


I wants to manually manage my code's the SIMD operations on MIC, and write the intrinsics below

_k_mask = _mm512_int2mask(0x7ff); // 0000 0111 1111 1111
_tempux2_512 = _mm512_mask_loadunpacklo_ps(_tempux2_512,_k_mask, &u_x[POSITION_INDEX_X(k,j,i-5)]);
_tempux2_512 = _mm512_mask_loadunpackhi_ps(_tempux2_512,_k_mask, &u_x[POSITION_INDEX_X(k,j,i-5)]+16);

And the compiler icpc gives these error message.

test.cpp:574: undefined reference to `_mm512_mask_extloadunpacklo_ps'
test.cpp:575: undefined reference to `_mm512_mask_extloadunpackhi_ps'

It will be okay to compile if I use _mm512_mask_load_ps, but my memory cannot be 64-byte-aligned so using _mm512_mask_load_ps will cause an runtime error.

Then I tried to write inline asm block manually like this

MOV rax,0x7ff
KMOV k1,rax
VMOVAPS zmm1 {k1}, [data_512_1]
VMOVAPS zmm2 {k1}, [data_512_2]
VMULPS  zmm3 {k1}, zmm2 zmm1
VMOVAPS [data_512_3] {k1}, zmm3

And the compiler icpc shows error again

test_simd.cpp(30): (col. 10) error: Unknown opcode KMOV in asm instruction .
test_simd.cpp(33): (col. 10) error: Syntax error ZMM1 in asm instruction vmulps.

I'm a beginner of assembly language,It would be really grateful if anyone can tell me why icpc didn't find the reference and how to fix it,or could recommend some materials to me. (I've read the Intel® Xeon Phi™ Coprocessor Instruction Set Architecture Reference Manual but still do not know how to write it.)

Thanks a lot.


Solution

  • It appears that you target AVX-512 instruction set, which is to be implemented in future desktop processors and Xeon Phi co-processors. The current generation of Xeon Phi uses a different instruction set, typically referred as KNCNI or K1OM, which is similar, but incompatible with AVX-512 (in particular, AVX-512 supports misaligned load instructions and KNCNI suggest to use a pair of load-unpack-lo + load-unpack-hi instructions for the same purpose). To compile for KNCNI you should use -mmic option for Intel Compiler (think of it as an alternative to -m64 option, which makes compiler target x86-64 ISA: the resulting code will not run on normal x86-64 processors and vice versa). AFAIK AVX-512 set is not yet supported in public releases of Intel Compiler, but most likely is will use a new -x option.