d

How do I know when to use `const ref` or `in`?


void foo(T, size_t size)(in T[size] data){...}
//vs
void foo(T, size_t size)(const ref T[size] data){...}

According to https://stackoverflow.com/a/271344/944430 it seems that in C++ pass by value can be faster in some situations.

But D has a special keyword in and I am wondering when I should use it. Does in always result in a copy or is it a compiler optimization?

Are there any guidelines that I can follow that help me decide between const ref and in?


Solution

  • I would argue that you should never use in on function parameters. in is an artifact from D1 that was kept to reduced code breakage but was changed to be equivalent to const scope. So, every time you think of typing in on a function parameter, think of const scope, since that's what you're really doing. And scope currently only does anything with delegates, in which case, it's telling the compiler that the function taking the delegate is not going to return it or assign it to anything and that therefore no closure has to be allocated to hold the state of that delegate (so, it improves efficiency in many cases for delegates), whereas for all other types, it's completely ignored, which means that using it is meaningless (and potentially confusing), and if it ever does come to mean something else for other types (e.g. it's been suggested that it should enforce that a pointer that's passed in as scope can't escape the function), then the semantics of your code could change in unexpected ways. Presumably, it'll be accompanied by the appropriate warnings when the happens, but why mark your code with a meaningless attribute that could have meaning later and thus force you to change your code? At this point, scope should only be used on delegates, so in should only be used on delegates, and you don't usually want const delegates. So, just don't use in.

    So, ultimately, what you're really asking is whether you should use const or const ref.

    The short answer is that you generally shouldn't use ref unless you want to mutate the argument you're passing in. I would also point out that this question is meaningless for anything but structs and maybe static arrays, because classes are already reference types, and none of the built-in types (save for static arrays) cost much of anything to copy. The longer answer is...

    Move semantics are built into D, so if you have a function that takes its argument by value - e.g.

    auto foo(Bar bar) { ... }
    

    then it will move the argument if it can. If you pass it an lvalue (a value that can be on the left-hand side of an assignment), then that value is going to be copied except maybe in circumstances where the compiler is able to determine that it can optimize the copy away (e.g. when the variable is never used after that function call), but that's going to depend on the compiler and compiler flags used. So, passing a variable to a function by value will usually result in a copy. However, if you pass the function an rvalue (the values that can't go on the left-hand side of an assignment), then it will move that object rather than copying it. This is different from C++, where move semantics were not introduced until C++11, and even then, they require move constructors, whereas D uses postblit constructors, which changes it so that moves can be done by default. A couple of previous SO questions on that:

    Does D have something akin to C++0x's move semantics?
    Questions about postblit and move semantics

    So, yes, there are cases in D where passing by ref would avoid a copy, but in D, ref always requires an lvalue (even with const). So, if you start putting ref const(T) everywhere like you'd do const T& in C++, you're going to have a lot of functions which are really annoying to call, because every temporary will have to be assigned to a variable first to call the function. So, you should seriously consider only ever using ref for when you want to mutate a variable that's passed in and not for efficiency. Certainly, your default should be to not pass by const ref, but if you do need that extra efficiency, you have two options:

    1. Overload the function on ref-ness so that you have an overload that takes by const ref and one that takes by ref so that the lvalues get passed to one without being copied, and the rvalues get passed to the other without needing an extraneous variable. e.g.
        auto foo(const Bar bar) { foo(bar); }
        auto foo(ref const(Bar) bar) { ... }
    

    And that's a bit annoying but works well enough when you only have one parameter with ref. However, you get a combinatorial explosion of overloads as more ref parameters are added. e.g.

        auto foo(const Bar bar, const Glop glop) { foo(bar, glop); }
        auto foo(ref const(Bar) bar, const Glop glop) { foo(bar, glop); }
        auto foo(const Bar bar, ref const(Glop) glop) { foo(bar, glop); }
        auto foo(ref const(Bar) bar, ref const(Glop) glop) { ... }
    

    So, that works to a point, but it's not particularly pleasant. And if you define the overloads like I did here, then it also has the downside that the rvalues end up being passed to a wrapper function (adding an extra function call - though one that should be quite inlinable), which means that they're now passed by ref to the main overload and if one of those parameters is passed to another function or returned, the compiler can't do a move, whereas if ref hadn't been involved, then it could have. That's one of the reasons that it's now argued that you shouldn't use const T& heavily in C++11 like you would have done in C++98.

    You can get around that problem by duplicating the function body for each overload, but that obviously creates a maintenance problem as well as creating code bloat.

    1. The alternative is to use auto ref, which basically does that for you, but the function has to be templated. e.g.
        auto foo()(const auto ref Bar bar, const auto ref Glop glop) { ... }
    

    So, now you only have one overload, but it still generates all of those overloads with the full code underneath the hood every time the template is instantiated with a different combination of ref-ness. So, your code is cleaner, but you still get more bloat, and if you need to do this with a virtual function, then you're out of luck and have to go back to the more explicit overload solution, because templated functions can't be virtual.

    So, in general, trying to have your functions accept const ref for efficiency reasons just gets ugly. The fact that D has move semantics built in reduces the need for it (just like with C++11, it's now argued that passing by value is often better, thanks to move semantics and how the compiler optimizes them). And it's ugly enough to do in D in the general case that unless you actually get a performance boost that matters, it's probably not worth passing by ref just for efficiency. You should probably avoid using ref for efficiency unless you've actually measured a difference in performance that's worth the pain.

    The other thing to consider - separate from ref-ness - is that D's const is a lot more restrictive than C++'s const (e.g. casting away const and mutating is undefined behavior in D, and D's const is transitive). So, slapping const all over the place can sometimes become problematic - especially in generic code. So, using it can be great for preventing accidental mutation or indicating that a function does not mutate its arguments, but don't just blithely slap it everywhere that shouldn't be mutating the variable like you would in C++. Use it where it makes sense, but be aware that you will run into cases where D's const is too restrictive to be usable, even if C++'s const would have worked.

    So, in most cases, when you want your function to take a T, you should default to it taking a plain T. And then if you know that efficiency is a concern, you can consider using some form of ref (probably favoring auto ref or const auto ref if you're not dealing with a virtual function). But default to not using ref. Your life will be much more pleasant that way.