ccs50implicit-conversionstring-literalspointer-arithmetic

Trouble understanding char* and string in CS50


So I know that a string is just an array of characters that are stored consecutively in a computer's memory.

I also know that in order to find out the location of a string, you just have to go to the location of the first character as its consecutive and the string ends when the program or function encounters the \0 character.

But what I don't understand is :

  1. char* s = "HI!";
    

Does it create an array of 4 characters? Or is it just a pointer pointing to the starting character location? Or is it doing both?

2.

    char* name = "BEN";
    printf("%c %c\n", *(name + 1), *name + 1);

Why do they both give two different outputs (E and C), instead of both giving E?


Solution

  • In this declaration

    char* s = "HI!";
    

    Two entities are created.

    The first one is the string literal "HI!" which has static storage duration and array type char[4]. (In C++ it has constant character array type const char[4], in opposite to C.)

    You can check this using printf

    printf( "sizeof( \"HI!\" ) = %zu\n", sizeof( "HI!" ) );
    

    Here the character array is used as an initializer of the pointer s. In this case it is implicitly converted to the first element of a pointer and the pointer s points to the address of the first element of the array.

    As for this code snippet

    char* name = "BEN";
    printf("%c %c\n", *(name + 1), *name + 1);
    

    The expression name + 1 has type char * and points to the second character of the string literal "BEN" (thus 'E'), due to the pointer arithmetic. Dereferencing the pointer expression like *(name + 1) you get the symbol of the string literal pointed to by the expression. Actually, the expression *(name + 1) is the same as name[1] that is the same as 1[name].:)

    As for this expression *name, dereferencing the pointer name you get the first symbol 'B' of the string literal. Then, 1 is added to the internal code of the symbol ( *name + 1 ), so the expression takes the value of the next symbol after 'B', which is 'C'. The expression ( *name + 1 )is equivalent to the expressionname[0] + 1`.

    Using the subscript operator like name[1] and name[0] + 1 makes the expressions more clear.

    I think it would be interesting for you to know that the call of printf may be rewritten using only the original string literal. Some examples:

    printf("%c %c\n", *( "BEN" + 1), *"BEN" + 1);
    

    or

    printf("%c %c\n", "BEN"[1], "BEN"[0] + 1);
    

    or even

    printf("%c %c\n", 1["BEN"], 0["BEN"] + 1);