cgccx86endiannessnvcc

Why don't multi-character literals respect architecture endianness?


With GCC on Intel x86 , and similarly with NVCC (Cuda),

#include <stdio.h>
#include <stdint.h>
#include <stdbool.h>
int main() {
    uint32_t v = 'abcd';          
    uint32_t w = 0x61626364;  
    bool b = v == w;
    printf("%d", b);
    return 0;
}

gives 1.

Question: I know that multi-character literals are "implementation-defined", but why don't multi-character literals respect architecture endianness?

On little-endian, it should be 0x64636261.

In order to circumvent this, is there a #pragma or other compile-time solution to make 'abcd' expressions interpreted as little-endian 0x64636261? Or, if really needed, with a preprocessor macro? (but without constexpr)

I have already read Why do multicharacter literals exist in C and C++? and multicharacter literal misunderstanding but they don't answer my question.


Solution

  • why don't multi-character literals respect architecture endianness?

    The behaviour is implementation-defined.

    That said, 0x61626364 is stored in reverse order on LE (64 63 62 61), and so is 'abcd' in the compilers you tried. So I don't see how you think it's not respecting the LE nature of the system.

    is there a #pragma or other compile-time solution to make 'abcd' expressions interpreted as little-endian 0x64636261?

    To get 0x64636261 (i.e. 61 62 63 64 on an LE machine), use

    (uint32_t)'a' | (uint32_t)'b' << 8 | (uint32_t)'c' << 16 | (uint32_t)'d' << 24
    

    You could, of course, use a macro or function to hide this mess.