c++stringmemorymalloccout

C++ character pointer string allocated with malloc contains gibberish when printed


As far as I can tell, I'm using malloc correctly to allocate a string as a character pointer.

  char* test = (char*) malloc (sizeof (char) * 4);

  test[0] = 't';
  test[1] = 'e';
  test[2] = 's';
  test[3] = 't';

  std::cout << test << "\n";

When I print the string, there are extra ascii characters at the end.

test²²²²ßE╜ε┬ô

My hunch is that the characters in "test" are taking up less than the max memory I'm allocating. The extra text I'm seeing might be excess memory that hasn't been written with anything meaningful. I won't necessarily know the exact length at allocation time, though, since that string will be assembled dynamically.

My question is how to refactor this to trim that fat off. If I know the character length of the string, can I truncate the final product somehow? Is there a better way to allocate that space that I'm unaware of? Is there a way to limit the number of characters that are printed?


Solution

  • You have two problems:

    1. You're not allocating enough memory to hold a complete C-style string (which requires one additional char to hold the NUL terminator)
    2. You're not providing the NUL terminator, nor using a zero-ing allocator, so the string won't be NUL terminated even if you allocate enough memory.

    malloc doesn't return pre-zeroed memory, so if you are creating a C-style string in the memory returned, you must manually NUL-terminate it, or C-string APIs will just keep reading until they encounter a NUL by coincidence or crash. What you're seeing is not excess memory (though malloc often overallocates a bit to round off allocation sizes, there's no guarantee it's happening in any particular case), but adjacent memory in the heap.

    For your code, just allocate the extra char for the NUL (you always need one more char than what you expect to put in the string) and add it explicitly and you're fine:

      char* test = (char*) malloc (sizeof (char) * 5);  // Allocate one extra character for the NUL
    
      test[0] = 't';
      test[1] = 'e';
      test[2] = 's';
      test[3] = 't';
      test[4] = '\0';  // NUL terminate
    
      std::cout << test << "\n";
    

    Alternatively, use calloc to ensure the whole buffer is zeroed up-front, so anything you don't set is safely zero-ed:

      char* test = (char*)calloc(5, sizeof(char));  // Allocate one extra character for the NUL with zero-ing calloc
    
      test[0] = 't';
      test[1] = 'e';
      test[2] = 's';
      test[3] = 't';
      // No need to add NUL, test[4] guaranteed to be NUL already
    
      std::cout << test << "\n";
    

    Of course, in idiomatic C++, you'd almost never use malloc, nor would you use C-style strings unless forced to do so (std::string exists, use it!), so idiomatic C++ would look like the much simpler:

    #include <string>
    
    using std::literals;
    
    int main(int argc, char **argv) {
        auto test = "test"s;
        // Or if you don't like using literals:
        // std::string test("test");
        // Or if you want to guarantee no allocation on C++17 and higher,
        // with a #include <string_view> and the std::literals usage:
        // auto test = "test"sv;
    
        std::cout << test << "\n";
    }
    

    any of which avoid the need to manually match allocation size to string size, making it much harder to shoot yourself in the foot like you just did.