c++stringg++escapingstring-view

Why does gdb octal-escape characters when querying a string assigned to a char array containing control characters?


I noticed that, when assigning a std::string_view (or a std::string) to a character-array containing control characters (e.g. '\001' [Start Of Heading]), then gdb will represent that string_view with octal-escape characters between the already existing control-characters and another specific character (in my case, '=').

Example:

#include <array>
#include <iostream>
#include <string>
#include <string_view>

int main(int argc, char *argv[]) 
{
    const std::array<char, 32> myArr = 
    { '8', '=', 'F', 'I', 'X', 'T', '.', '1', '.', '1', 
    '\001', '9', '=', '9', '0', '\001', '3', '5', '=', 'A' };


    const std::string_view view(myArr.begin(), myArr.size());
    const std::string      str (myArr.begin(), myArr.size());

    std::cout << "VIEW:   " << view << std::endl;
    std::cout << "STRING: " << str  << std::endl;

    return 0;

}

Compiled with g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, this outputs:

VIEW:   8=FIXT.1.19=9035=A
STRING: 8=FIXT.1.19=9035=A

Looking at both view and str in the gdb debugger, we see:

"8=FIXT.1.1\001\071=90\001\063\065=A", '\000' <repeats 11 times>

We can see that after the first occurrence of '\001' , all "key" values are octal-escaped.

Why is this the case?

When removing the control characters from the array, the "escaping-conversion" is not done. However, in a real-world example, the array must contain control characters.

I do want to note that this is not a problem since substring lookups still work perfectly fine.

I'm just curious to know why gdb gives us this representation.


Solution

  • Think of what would happen if the digits after the control character were not escaped when GDB prints them. It would look like this:

    "8=FIXT.1.1\0019=90\00135=A", '\000' <repeats 11 times>
    

    The end of the actual octal escape sequence would be unrecognizable. So, GDB is doing you a service that it tells where the next character begins by escaping it.

    If you put non-digits after the control characters, GDB does not need to escape them because they cannot be confused with the escape squence.