cstring

Why does \0 not affect the length of a string in C?


Consider this example where I add some extra \0 to a string.

#include <stdio.h>
#include <string.h>
int main(int argc, char **argv){
    char str1[] = "dog";
    char str2[] = "dog\0";
    char str3[] = "dog\0\0";
    printf("%ld %ld %ld\n", strlen(str1), strlen(str2), strlen(str3));
    return 0;
}

But the strlen() return 3 for all of them. Now:

  1. Why is it that the extra \0 not cause the size to increase?
  2. If the return value of strlen() is same, is their actual size in memory also same?

Solution

  • The declaration

    char str1[] = "dog";
    

    is equivalent to:

    char str1[] = { 'd', 'o', 'g', '\0' };
    

    The declaration

    char str2[] = "dog\0";
    

    is equivalent to:

    char str2[] = { 'd', 'o', 'g', '\0', '\0' };
    

    The declaration

    char str3[] = "dog\0\0";
    

    is equivalent to:

    char str3[] = { 'd', 'o', 'g', '\0', '\0', '\0' };
    

    In C, a string is a contiguous sequence of characters whose end is marked with a null character ('\0'). The function strlen will return the number of characters in the string without the null character. This value is different from the length of the array which contains the string.

    1. Why is it that the extra \0 not cause the size to increase?

    In all three cases, the fourth character is the null character which marks the end of the character sequence, so the length of the strings are 3 in all chases. Only the sizes of the arrays which contain the strings are different.

    1. If the return value of strlen() is same, is their actual size in memory also same?

    The size in memory of the strings themselves are the same. But the sizes of the arrays which contain the strings are different. You can print the sizes of the arrays like this:

    #include <stdio.h>
    
    int main( void )
    {
        char str1[] = "dog";
        char str2[] = "dog\0";
        char str3[] = "dog\0\0";
    
        printf( "%zu %zu %zu\n", sizeof str1, sizeof str2, sizeof str3 );
    }
    

    This program has the following output:

    4 5 6
    

    The function strlen will return the length of the string, whereas the sizeof operator will yield the length of the array.

    Note that both strlen and sizeof will yield a value of type size_t, and %zu is the correct conversion format specification for that data type. The %ld conversion format specification is for the data type long, which is not being used here. By using the incorrect conversion format specification, your program is invoking undefined behavior.