assemblylinux-kernelbootloaderas86

what is the assembly language version (or type) that linux 0.11 bootsect.s source code based on?


I'm learning linux kernel source code.

And I've already got some basic idea about assemly language, like the usage of general instructions(such as mov, add, jmp, call...), the difference between AT&T type and Intel type.

So for now, it isn't a big problem for me to understand the rough idea of what these asm code is doing. But these directives like .text .data showing at the head and tail of the following code confuse me a lot.

So, my direct question is what is the meaning of the .text pair, .data pair? My root question is what is the asm version or type these syntax based on? I think it is a subversion of Intel as there is no '$' before constants. But why there is '#' and '_start' instead of 'main'? Where could I find a complete introduction of all these related asm grammar?

Help me, please!

Thanks a lot!

.globl begtext, begdata, begbss, endtext, enddata, endbss
.text
begtext:
.data
begdata:
.bss
begbss:
.text

BOOTSEG  = 0x07c0           ! original address of boot-sector
INITSEG  = 0x9000           ! we move boot here - out of the way
SETUPSEG = 0x9020           ! setup starts here

entry _start
_start:
    mov ah,#0x03        
    xor bh,bh
    int 0x10
... ...

.text
endtext:
.data
enddata:
.bss
endbss:

Solution

  • Complete introduction of all these related asm grammar you will find in the documentation of the assembler. Unlike machine instructions, such as mov, add, jmp, call,,, there are many directives and pseudoinstructions which are not normalized and they depend on personality and experience of authors of the language.

    Statement .globl begtext, begdata, begbss, endtext, enddata, endbss declares that some labels (defined somewhere in the source text later) will be GLOBAL alias PUBLIC, i.e. they can be accessed from other program modules linked with the kernel.

    Labels .text, .data, .bss are directives which tell the assembler to redirect its output (emitted code and data) to a particular segment. Executable program file contains one segment with machine instructions (.text alias .code) and segments for data (.data, .rodata, .bss), but the source text doesn't have to be written in this order. Imagine that Linus (or whoever has written the source) is the boss who dictates the source code to his secretary (an Assembler). He tells it to emit three instructions which use BIOS service INT 0x10 to switch the console to 80*25 text mode:

    .text
     mov ah,#0x03        
     xor bh,bh
     int 0x10
    

    Secretary will write (emit) those instructions on a sheet of paper labeled .text. Then Linus decides to define some message, so he tells the secretaty the directive .data, followed by the message definition:

    .data    
     Message DB "Kernel is starting, please wait."
    

    Secretary will grab another sheet of paper, label it .data and writes the Message definition on it. When Linus decides to dictate other machine code, secretary takes back the sheet .text and continues writing at the spot (origin) where it was interrupted - below the instruction INT 0x10. In this way they may alternate the output segments ad libitum and keep the data near the code which manipulates with it (this is good for readability of the program). Finally, all paper sheets will be stapled (linked) together, so all machine instructions end up near each other in .text segment, and similary data in .data segment.