vectorizationsseavxintel-micxeon-phi

Scatter/Gather in Xeon Phi


I was referring to Intel's manual on the Xeon Phi instruction set and wasn't able to understand how the scatter/gather instructions work.

Suppose if I have the following vector of doubles:

A-> |b4|a4|b3|a3|b2|a2|b1|a1|

Is it possible to create 4 vectors as follows:

V1->|b1|a1|b1|a1|b1|a1|b1|a1|
V2->|b2|a2|b2|a2|b2|a2|b2|a2|
V3->|b3|a3|b3|a3|b3|a3|b3|a3|
V4->|b4|a4|b4|a4|b4|a4|b4|a4|

using these instructions? Is there any other way to achieve this?


Solution

  • Got this from the Intel Forums (answered by Evgueni Petrov):

    __m512d V1 = (__m512d)_mm512_extload_epi32(&Addr, _MM_UPCONV_EPI32_NONE, _MM_BROADCAST_4X16, _MM_HINT_NONE);
    

    where 'Addr' is the address of the location in memory, from which we loaded the doubles into vector 'A'.

    We can do a similar operation for V2,V3,V4, by using &(Addr+2), &(Addr+4) and &(Addr+6) respectively.