string assembly low-level null-terminated

Newlines and null-terminated string in assembly

I am new to assembly and i was looking at a hello world program:

section .data
    hello:     db 'Hello, World!',10    ; 'Hello, World!' plus a linefeed character
    helloLen:  equ $-hello             ; Length of the 'Hello world!' string

section .text
    global _start

_start:
    mov eax,4            ; The system call for write (sys_write)
    mov ebx,1            ; File descriptor 1 - standard output
    mov ecx,hello        ; Put the offset of hello in ecx
    mov edx,helloLen     ; helloLen is a constant, so we don't need to say
                         ;  mov edx,[helloLen] to get it's actual value
    int 80h              ; Call the kernel
    mov eax,1            ; The system call for exit (sys_exit)
    mov ebx,0            ; Exit with return "code" of 0 (no error)
    int 80h;

My question is regarding the string db 'Hello, World!',10. What i understand is that this is the "Hello, World!" string terminated by a linefeed, but from what i know strings have to be terminated by a null character, and what i've learned from another source regarding that is to create strings as db "Hello, World!,0.

How do you add a newline/linefeed at the end (or inside) of a string and also the null terminator at the very end?

(I do understand that, in the given program, the code is printed using the given length and that that is an option, but null terminated strings seem better, having to give the length for every string you'll write ever doesn't seem very good)

Solution

(I accidentally overlooked your last sentence, so the first part of this answer is explaining what you probably already know. I'll just leave it in case it is helpful to some future reader.)

from what i know strings have to be terminated by a null character

It's true that a null terminator is a very common way to mark the end of a string, and that many functions are designed to work with strings in this form; e.g. all the str* functions in the standard C library. It's common enough that people often use the word "string" to mean "null-terminated string" without further clarification.

But the write system call is not one of those functions. It doesn't rely on a null terminator to mark the end of the data to be written; rather, it uses the length argument (which you have put in the edx register) to know how many bytes from memory should be written to the file descriptor. If you think about it, it has to be this way: if write used null-terminated strings, then it would be impossible to use it to write a binary file that needs to contain null bytes.

So in your program, since the only thing you do with your hello message is to pass it to the write system call, there is no need to include a terminating null byte. It would just waste a byte of memory.

If you do want to include a terminating null byte (perhaps because you want to use this string with str* functions, printf, etc), then simply append a 0:

hello:     db 'Hello, World!', 10, 0

You didn't mention which assembler you are using, but it looks like nasm. The nasm documentation is pretty complete, and it explains that the db directive can accept an arbitrary list of strings and numerical expressions.

Note that, if you keep the helloLen definition below, then that length would include the terminating null, which is likely not what you want. But if you were using functions that detect length by the null terminator, then you probably wouldn't need helloLen at all.