c++simdintrinsicsinstruction-set

SIMD _mm_store_si128 | _mm_storeu_si128 don't storing correctly


I have a string

const signed char From[] = {
    0b00000000, 0b00000001, 0b00000010, 0b00000011,
    0b00000100, 0b00000101, 0b00000110, 0b00000111,
    0b00001000, 0b00001001, 0b00001010, 0b00001011,
    0b00001100, 0b00001101, 0b00001110, 0b00001111,
};

i need to place chars from this string into a __m128i vector and than store bytes from this vector into a std::uint32_t Schedule[4] array

Doing this:

const __m128i Chars = _mm_set_epi8 (
    From[0],
    From[1],
    From[2],
    From[3],
    From[4],
    From[5],
    From[6],
    From[7],
    From[8],
    From[9],
    From[10],
    From[11],
    From[12],
    From[13],
    From[14],
    From[15]
);

And now store this vector into Schedule:

_mm_store_si128((__m128i*)&Schedule[Kter], SettedLines4);

And we have incorrect Byte order in Schedule(also i think this is not an incorrect byte order. I think bits were mixed)

template<typename Type>
[[ nodiscard ]] std::string to_binary(Type Data) noexcept
{
    std::string Result = "";

    std::uint64_t CurrentBit = 8 * sizeof Data;
    while(CurrentBit--)
        Result += ((Data >> CurrentBit) & 1) == 0 ? "0" : "1";

    return Result;
}

std::cout << to_binary(Schedule[0]) << std::endl;
std::cout << to_binary(Schedule[1]) << std::endl;
std::cout << to_binary(Schedule[2]) << std::endl;
std::cout << to_binary(Schedule[3]) << std::endl;

Full code:

# include <x86intrin.h>
# include <iostream>
# include <cstdint>

template<typename Type>
[[ nodiscard ]] std::string to_binary(Type Data) noexcept
{
    std::string Result = "";

    std::uint64_t CurrentBit = 8 * sizeof Data;
    while(CurrentBit--)
        Result += ((Data >> CurrentBit) & 1) == 0 ? "0" : "1";

    return Result;
}

int main(void)
{
    std::uint32_t Schedule[4];
    const signed char From[] = {
        0b00000000, 0b00000001, 0b00000010, 0b00000011,
        0b00000100, 0b00000101, 0b00000110, 0b00000111,
        0b00001000, 0b00001001, 0b00001010, 0b00001011,
        0b00001100, 0b00001101, 0b00001110, 0b00001111,
    };

    const __m128i v = _mm_set_epi8 (
        From[0],
        From[1],
        From[2],
        From[3],
        From[4],
        From[5],
        From[6],
        From[7],
        From[8],
        From[9],
        From[10],
        From[11],
        From[12],
        From[13],
        From[14],
        From[15]
    );

    _mm_store_si128((__m128i*)&Schedule[0], v);
    std::cout << to_binary(Schedule[0]) << std::endl;
    std::cout << to_binary(Schedule[1]) << std::endl;
    std::cout << to_binary(Schedule[2]) << std::endl;
    std::cout << to_binary(Schedule[0]) << std::endl;
}

How can i fix it? I need to get the same byte order as expected


Solution

  • Your issue is that _mm_set_epi8 takes parameters starting from most significant. You need _mm_setr_epi8 or reverse parameters order

    Also you have a typo:

        std::cout << to_binary(Schedule[0]) << std::endl;
        std::cout << to_binary(Schedule[1]) << std::endl;
        std::cout << to_binary(Schedule[2]) << std::endl;
        std::cout << to_binary(Schedule[0]) << std::endl;
    

    The last is 0.

    Also it is an error to use _mm_store_si128 for not properly aligned destination. This may or may not crash. Use _mm_storeu_si128 or align your destination by 16.