stringmemory-managementdinterfacing

How does std.string.toStringz work in dlang?


https://dlang.org/library/std/string/to_stringz.html

In my understanding it could not work:

toStringz creates an array on the stack and returns its pointer. After toStringz returns, the array on the stack is discarded and the pointer becomes invalid.

But I suppose it indeed works because of being a part of the standard library. So what is wrong in my understanding of the above?

Another related question:

What does scope return in the signature of this function mean? I visited https://dlang.org/spec/function.html but found no scope return there.


Solution

  • It does not create an array on the stack. If necessary, it allocates a new string on the GC heap.

    The implementation works by checking the existing string for a zero terminator - if it deems it possible to do so without a memory fault (which is guesses by checking the alignment of the last byte. If it is a multiple of four, it doesn't risk it, but if it is not, it reads one byte ahead of the pointer because fault boundaries are on multiple of four intervals).

    If there is a zero byte already there, it returns the input unmodified. That's what the return thing in the signature means - it may return that same input. (This is a new feature that just got documented... yesterday. And it isn't even merged yet: https://github.com/dlang/dlang.org/pull/2536 But the stdlib docs are rebuilt from the master branch lol)

    Anyway, if there isn't a zero byte there, it allocates a new GC'd string, copies the existing one over, appends the zero, and returns that. That's why the note in the documentation warns about the C function keeping it. If the C function keeps it beyond execution, it isn't the stack that will get it - it is the D garbage collector. D's GC cannot see memory allocated by C functions (unless specifically informed about it) and will think the string is unreferenced next time it runs and thus free it, leading to a use-after-free bug.

    The scope keyword in the signature is D's way of checking this btw: it means the argument will only be used in this function's scope (though the combination of return means it will only be used in this function's scope OR returned through this function). But that's on toStringz's input - the C function you call probably doesn't use that D language restriction and this it would not be automatically caught.

    So to sum up the attributes again:

    scope - the argument will not leave the function's scope. Won't be assigned to a global or an external structure, etc.

    return - the argument might be returned by the function.

    return scope - hybrid of the above; it will not leave the function's scope EXCEPT through the return value.