c++c++11user-defined-literals

User Defined Literals for a String versus for a Hex Value


Regarding this question, why does a a user defined literal for a hex value map to a different string literal operator than a string does? That is, why does the code

std::vector<uint8_t> val1 = 0x229597354972973aabbe7_hexvec;

map to

std::vector<uint8_t> operator"" _hexvec(const char*str)
{
    // Handles the form 0xFFaaBB_hexvec and 0Xf_hexvec
    size_t len = strlen(str);
    return convertHexToVec(str, len);   
}

while the code

std::vector<uint8_t> val2 = "229597354972973aabbe7"_hexvec;

maps to

std::vector<uint8_t> operator"" _hexvec(const char*str, std::size_t len)
{
    // Handles the conversion form "0xFFAABB"_hexvec or "12441AA"_hexvec
    return convertHexToVec(str, len);
}

What makes the size_t necessary when both are null terminal strings? For that matter, why is 0x551A_hexvec a string at all? Why not an integer?


Solution

  • What makes the size_t necessary when both are null terminal strings?

    There ain't no rule in C++ that a string literal cannot have NUL characters embedded in it. "Nul\0character" is a valid C++ string literal. And when doing UDL processing, the C++ language wants to make sure that you know which bytes are actually part of the string. To do that, you need a size.

    Also, it allows the system to differentiate between literals intended to operate on strings and literals intended to operate on numbers. The literal 21s could mean 21 seconds, while the literal "21"s can mean a std::string containing the character string "21". And both literals can be in-scope without any kind of cross-talk.

    Numeric literal UDL functions don't take a size_t to differentiate themselves from an overload intended for string literals. However, numeric literal cannot have a NUL-character in it, so they don't lose much by not being given a size.

    For that matter, why is 0x551A_hexvec a string at all? Why not an integer?

    Because that's what you asked for.

    Your UDL function for numeric literals can process the raw literal data (as a string) or a synthesized literal value. If you use the const char* version of the UDL, you are asking to process the raw literal data.

    A synthesized literal value is a C++ type computed from the literal using the regular rules for literals. For integer numeric literals, the synthesized literal value is an unsigned long long: the largest fundamental integer type available to C++:

    std::vector<uint8_t> operator"" _hexvec(unsigned long long value);
    

    Of course, the fact that unsigned long long is of a finite size is precisely why the raw literal version exists. The literal 0x229597354972973aabbe7 cannot fit into an unsigned long long, but you may still want to be able to fit it into the object you're generating. Therefore, you have to be able to access the actual characters of the literal value.