c++arrayspointers

How does array indexing work differently between 1D and 2D arrays in C++?


I'm confused about how array indexing actually works in C++. I understand that when we use the expression x[n] (where x is an address and n is an integer), the compiler treats [] as an operator that calculates an offset from the base address.

I was taught that the compiler considers [] as an operator which moves the address by the number of bytes the data type in that address takes multiplied by the number in [] (here y).

For a 1D array, I understand that arr[i] is interpreted as to *(arr + i), which moves the address by i * sizeof(element_type) bytes. However, I'm confused about how this works with 2D arrays. Consider this example:

char country[][20] = {"U.S.A", "CHINA", "RUSSIA"};

If I access country[2], I get "RUSSIA" (the entire string), not the character 'S' as I might expect if the bracket operator simply moved 2 bytes from the start address.

But, if this is true, then why is it that when a 2D array is initialized as above, country[2] will be the sequence of characters "RUSSIA" and not the character 'S' (since the name of the array country is the address of the first term which is "U" and country[2] should mean that which comes 2 bytes after address of "U")?


Solution

  • How does array indexing work differently between 1D and 2D arrays in C++?

    It does not work at all differently. In a sense, that's because C++ (and C) does not have true 2D arrays. What is conventionally called a 2D array in C++ and C is a (1D) array whose elements are themselves (1D) arrays. That's why the indexing operator takes only two operands, regardless of whether the array is (as we consider it) 1D, 2D, 3D, or higher dimensional.

    In particular, this ...

    char country[][20] = {"U.S.A", "CHINA", "RUSSIA"};
    

    ... declares country as an array of 3 (as determined from the initializer) arrays of 20 char each. That is, each of the expressions country[0], country[1], and country[2] designates an array of 20 char. The first is initialized from "U.S.A", the second from "CHINA", and the third from "RUSSIA".

    For a 1D array, I understand that arr[i] is interpreted as to *(arr + i), which moves the address by i * sizeof(element_type) bytes.

    I think it's more helpful, and it's certainly more robust, to pin the semantics of both addresses and indexing to the array elements than it is to try to break it down in terms of bytes. But if you do break it down to bytes then your characterization is correct. And that's consistent with what you observe, because the relevant element type for indexing (once) into country is char[20].

    If I access country[2], I get "RUSSIA" (the entire string),

    Yes, you do, more or less. Because the elements of country, which are the units measured by indexes into that array, are arrays of type char[20]. Not individual chars.

    But do bear in mind, however, that "C string" is a characterization of the contents of an array, not part of such an array's data type. The expression country[2] identifies the whole array, which is more than "the entire string". The six characters of "RUSSIA" are what is emitted if, say, you feed country[2] to std::cout, but that's a formatted representation, which you should take care to distinguish from the thing itself.

    not the character 'S' as I might expect if the bracket operator simply moved 2 bytes from the start address.

    Yes. And if by this point I have not driven home why the single character 'S' is the wrong expectation, then I'm not sure what else to say.