I have written a simple c
code which shows below. In this code snippet I want to verify where the const string abcd
stores. I first guess that it should be stored in .data
section for read-only. After a test in Debian, however, things is different from what I initial guessed. By checking the assembly code which generated by gcc, I find it is placed in the stack frame of function p
. But when I try it later in OSX, the string is stored in .data
section again. Now I am confused by this. Is there any standard for the storing of const string?
#include<stdio.h>
char *p()
{
char p[] = "abcd";
return p;
}
int main()
{
char *pp = p();
printf("%s\n",pp);
return 0;
}
UPDATE: rici's answer awaken me. In OSX, the initial literal is stored in .data
and then moved into function's stack frame later. Thus, it becomes a local variable for this function. However, gcc in Debian handle this situation is different from OSX. In Debian, gcc directly stored literal in stack instead of moving it from .data
. I'm sorry for my carelessness.
There is a huge difference between:
const char s[] = "abcd";
and
const char* t = "abcd";
The first of these declares s
to be an array object initialized from the string "abcd". s
will have an address distinct from that of any other object in the program. The character string itself might be a compile-time artifact; the initialization is a copy so the character string does not need to be present at runtime if the compiler can find some other way of performing the initialization (such as a store immediate operation).
The second declaration declares t
to be a pointer to a string constant. The string constant now must be present at runtime, because expressions like t+1
, which are pointers inside the string, are valid. The language standard does not guarantee that every occurrence of string literals in the program is unique, nor does it guarantee that all occurrence are merged (although good compilers will try to do the second.) It does, however, guarantee that they have static lifetime.
Consequently, this is undefined behaviour, because the lifetime of the array s
ends when the function returns:
const char *gimme_a_string() {
const char s[] = "abcd";
return s;
}
However, this is fine:
const char *gimme_a_string() {
const char *s = "abcd";
return s;
}
Also:
const char s[] = "abcd";
const char t[] = "abcd";
printf("%d\n", s == t);
is guaranteed to print 0
, while
const char* s = "abcd";
const char* t = "abcd";
printf("%d\n", s == t);
might print either 0
or 1
, depending on the implementation. (As written, it will almost certainly print 1
. However, if the two declarations are in separate compilation units and lto is not enable, it is likely to print 0
.)
Since the array form is initialized with a copy the non-const version is fine:
char s[] = "abcd";
s[3] = 'C';
But the char pointer version must be a const
to avoid undefined behaviour.
// Will produce a warning on most compilers with compile option -Wall or equivalent
char* s = "abcd";
// *** UNDEFINED BEHAVIOUR *** Can cause random program breakage
s[3] = 'C';
Technically, the non-const declaration of s
is legal (which is why the compiler only warns) because it is the attempt to modify the constant which is UB. But you should always heed compiler warnings; it is better to think of the declaration / initialization as wrong, because it is.