I'm learning linux kernel source code.
And I've already got some basic idea about assemly language, like the usage of general instructions(such as mov
, add
, jmp
, call
...), the difference between AT&T type and Intel type.
So for now, it isn't a big problem for me to understand the rough idea of what these asm code is doing. But these directives like .text
.data
showing at the head and tail of the following code confuse me a lot.
So, my direct question is what is the meaning of the .text
pair, .data
pair? My root question is what is the asm version or type these syntax based on? I think it is a subversion of Intel as there is no '$' before constants. But why there is '#' and '_start' instead of 'main'? Where could I find a complete introduction of all these related asm grammar?
Help me, please!
Thanks a lot!
.globl begtext, begdata, begbss, endtext, enddata, endbss
.text
begtext:
.data
begdata:
.bss
begbss:
.text
BOOTSEG = 0x07c0 ! original address of boot-sector
INITSEG = 0x9000 ! we move boot here - out of the way
SETUPSEG = 0x9020 ! setup starts here
entry _start
_start:
mov ah,#0x03
xor bh,bh
int 0x10
... ...
.text
endtext:
.data
enddata:
.bss
endbss:
Complete introduction of all these related asm grammar you will find in the documentation of the assembler. Unlike machine instructions, such as mov, add, jmp, call,,,
there are many directives and pseudoinstructions which are not normalized and they depend on personality and experience of authors of the language.
Statement .globl begtext, begdata, begbss, endtext, enddata, endbss
declares that some labels (defined somewhere in the source text later) will be GLOBAL alias PUBLIC, i.e. they can be accessed from other program modules linked with the kernel.
Labels .text, .data, .bss
are directives which tell the assembler to redirect its output (emitted code and data) to a particular segment. Executable program file contains one segment with machine instructions (.text
alias .code
) and segments for data (.data, .rodata, .bss
), but the source text doesn't have to be written in this order.
Imagine that Linus (or whoever has written the source) is the boss who dictates the source code to his secretary (an Assembler). He tells it to emit three instructions which use BIOS service INT 0x10 to switch the console to 80*25 text mode:
.text
mov ah,#0x03
xor bh,bh
int 0x10
Secretary will write (emit) those instructions on a sheet of paper labeled .text
. Then Linus decides to define some message, so he tells the secretaty the directive .data
, followed by the message definition:
.data
Message DB "Kernel is starting, please wait."
Secretary will grab another sheet of paper, label it .data
and writes the Message definition on it. When Linus decides to dictate other machine code, secretary takes back the sheet .text
and continues writing at the spot (origin) where it was interrupted - below the instruction INT 0x10
. In this way they may alternate the output segments ad libitum and keep the data near the code which manipulates with it (this is good for readability of the program). Finally, all paper sheets will be stapled (linked) together, so all machine instructions end up near each other in .text
segment, and similary data in .data
segment.