While learning storage classes in C, I stumbled upon something called Linkage
and the following site cpp.reference
describes linkage with translation units not in terms of a source file
(.c file, what I initially expected to learn)
Based on my understanding,from What exactly is a translation unit in C
A translation unit is what comes after preprocessing (header files inclusions, macros, etc along with the source file)
Can we call .i
file a translation unit ?
I'm having hard time understanding how come storage class matter at translation-unit and furthermore how can I identify different translation units.
C is most often implemented by compiling one source file at a time, which produces an object module, and then linking object modules into a executable file. (Executable files may also involve dynamic libraries, and C may be used for other things than simple executable files. This answer gets at the gist of linking and does not cover complications.) Even if you execute a command that compiles multiple source files and produces an executable file, this is usually performed as a series of individual compilations followed by a link.
During compilation, the compiler makes use of all the identifiers declared in the source file. It uses these to understand types, to resolve references to objects stored on the stack, and more. Any fully resolved references result in machine code that refers to memory numerically, by offsets from some base address register (such as a stack pointer or frame pointer) or similar means. Once the use of an identifier is fully resolved, the machine code does not need the actual name, and may not be present in the object file.1
Because the object module does not contain the local identifiers of a translation unit (in a way that is normally used during linking, see footnote), one object module cannot use the local identifiers of another translation unit. The linker has no way to relate local identifiers between object modules.
When an identifier has external linkage, the compiler includes information about it in the object module. This information is available during linking. One object module can refer to an identifier that is defined in another object module, and the linker (or the program loader) can “fix up” the machine code where an identified entity is used. (When external references are used in C code, the compiler generates machine code that is incomplete; the addresses or offsets needed to execute the instruction are incomplete. Separately from the machine code, an object module includes a table of places in the machine code that need to be finished. That table also includes information about how to finish the machine code, including the name of the identifier that was used in the source code and the type of reference that appears in the machine code.)
The C standard was written to accommodate this method of building C programs. The reason it uses “translation unit” is that we need a defined way of talking about compiling one source file. When a source file is compiled, more files than just that source file are usually involved—most source files use #include
to incorporate the contents of other files. The term “translation unit” is defined to mean the contents that result from combining the contents of the original source file and all included files.
Question: what do the terms current translation unit, other translation units, same translation unit, and all translation units mean in the context of linkage?
The current translation unit is the one being compiled.
Other translation units are translation units other than the one being compiled.
The “same translation unit” phrase you ask about is refer to multiple declarations of one identifier. It is talking about a situation in which one translation unit contains multiple declarations of one identifier, and at least one declaration gives the identifier internal linkage and at least one declaration gives the identifier external linkage.
(Internal linkage is not used in linking object modules. It is something the compiler handles when compiling one translation unit, when multiple declarations for an identifier with static linkage appear in different places in the translation unit.)
Can we call .i file a translation unit ?
Yes. (A compiler might not actually produce a .i
file internally if you do not ask it to, but .i
files are intended to make visible to the user the result of incorporating all the included files and doing preprocessing. These files are most often used for debugging or similar analysis.)
I'm having hard time understanding how come storage class matter at translation-unit and furthermore how can I identify different translation units.
Each source file you compile, with all of its included files, is a translation unit.
1 There are complications here. Object modules may include debugging information, which provides information about the identifiers for a debugger to use. That debugging information is separate from the linking process that resolves external identifiers.