cmallocc-stringssizeofnull-terminated

sizeof('\0') null terminator as literal is four bytes but how come in string of characters it takes only one byte?


In c '\0' null-terminator as literal takes 4-bytes (as it's just zero internally) but how come it takes only 1-byte when used in an array of characters or string of characters? Is this compiler magic?

Does a programmer need to take special care when using dynamic memory allocation to handle null terminator size? Is the below program fine?

#include<stdio.h>
#include<stdlib.h>

int main()
{
   printf("size of null-termination: %lu\n", sizeof('\0')); //outputs 4 bytes
   printf("size of 0: %lu\n", sizeof(0)); // outputs 4 bytes

   char *message = malloc(10);
   message[0] = 'A';
   message[1] = 'B';
   message[2] = 'C';
   message[3] = '\0'; // takes 1-byte in below memory layout(attached image)

   message[4] = 'a';
   message[5] = 'b';
   message[6] = 'c';
   message[7] = '\0'; // takes 1-byte in below memory layout(attached image)

   message[8] = 'X';
   message[9] = 'Y';

   printf("\n");
   return 0;
}

enter image description here


Solution

  • In C opposite to C++ '\0' is an integer character constant that has the type int.

    Within a character string literal such escape sequence is stored as one character.

    From the C Standard (6.4.4.4 Character constants)

    10 An integer character constant has type int. The value of an integer character constant containing a single character that maps to a single-byte execution character is the numerical value of the representation of the mapped character interpreted as an integer

    and (6.4.5 String literals)

    6 In translation phase 7, a byte or code of value zero is appended to each multibyte character sequence that results from a string literal or literals.78) The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence. For character string literals, the array elements have type char, and are initialized with the individual bytes of the multibyte character sequence.