cmemory-layoutdata-segment

Why data segment of C was separated as two sections?


All globally initialised values are stored in .data segment .i.e. initialised data segment and uninitialised values are stored in bss and compiler initialise those uninitialised values to zero automatically in bss. Then why data segment is separated as .data and bss.

Whether it has an advantage or not ? Or any benefit?


Solution

  • The C programming language (it is a specification written in English) does not know about .bss or .data or data segments. Check that by reading n1570 (the latest draft of C11).

    In some cases (e.g. embedded computing) you don't have any data segment. For example when you cross-compile for an Arduino, the resulting code gets uploaded to the flash memory of that microcontroller (and data is in RAM, which your program would perhaps explicitly clear).

    (most of my answer below is focused on Linux, but you could adapt it to other OSes)

    Regarding the data segment on Unix-like systems like Linux, read more the specification of ELF. It is convenient to avoid spending file space in the executable file for something (the .bss) which is all zeros. The kernel code for execve(2) is able to set up an initial virtual address space with some cleared data (which is not mapped to some file segment). See also mmap(2) & elf(5) & ld-linux(8). Try some cat /proc/$$/maps command to understand the virtual address space of your shell. See proc(5).

    So the executable contains segments (some of which are all zeros and don't take any disk space), and is constructed by the linker -which also handle relocations- from several object files provided by the compiler from source code files. On Linux, use objdump(1) and readelf(1) (and nm(1)...) to explore and inspect ELF executables and ELF object files.

    BTW a cleared data segment don't need to be fetched from disk by the virtual memory subsystem, and that might make things slightly faster. The kernel would just clear a page in RAM when it is paged in.

    So the .bss exist to avoid wasting disk space in executables (and to speedup initializing of zeroed data). Obviously, some mutable data is explicitly initialized to non-zero content (and that needs to sit in .data and takes some disk space in the executable). Constant immutable read-only data goes into .rodata (into the text segment, generally shared by several processes running the same program)

    You could configure your linker (e.g. with some linker script) to put all the data (even the cleared ones) in some explicit data segment (but doing so is just wasting disk space)...... Historically, Unix have been developed on machines with little but costly disk space (so wasting it was unthinkable at that time, hence the need of .bss; today you care less!)

    Read Levine's book Linkers and Loaders for more, and Advanced Linux Programming and Operating Systems : Three Easy Pieces.