void foo(T, size_t size)(in T[size] data){...}
//vs
void foo(T, size_t size)(const ref T[size] data){...}
According to https://stackoverflow.com/a/271344/944430 it seems that in C++ pass by value
can be faster in some situations.
But D has a special keyword in
and I am wondering when I should use it. Does in
always result in a copy or is it a compiler optimization?
Are there any guidelines that I can follow that help me decide between const ref
and in
?
I would argue that you should never use in
on function parameters. in
is an artifact from D1 that was kept to reduced code breakage but was changed to be equivalent to const scope
. So, every time you think of typing in
on a function parameter, think of const scope
, since that's what you're really doing. And scope
currently only does anything with delegates, in which case, it's telling the compiler that the function taking the delegate is not going to return it or assign it to anything and that therefore no closure has to be allocated to hold the state of that delegate (so, it improves efficiency in many cases for delegates), whereas for all other types, it's completely ignored, which means that using it is meaningless (and potentially confusing), and if it ever does come to mean something else for other types (e.g. it's been suggested that it should enforce that a pointer that's passed in as scope
can't escape the function), then the semantics of your code could change in unexpected ways. Presumably, it'll be accompanied by the appropriate warnings when the happens, but why mark your code with a meaningless attribute that could have meaning later and thus force you to change your code? At this point, scope
should only be used on delegates, so in
should only be used on delegates, and you don't usually want const
delegates. So, just don't use in
.
So, ultimately, what you're really asking is whether you should use const
or const ref
.
The short answer is that you generally shouldn't use ref
unless you want to mutate the argument you're passing in. I would also point out that this question is meaningless for anything but structs and maybe static arrays, because classes are already reference types, and none of the built-in types (save for static arrays) cost much of anything to copy. The longer answer is...
Move semantics are built into D, so if you have a function that takes its argument by value - e.g.
auto foo(Bar bar) { ... }
then it will move the argument if it can. If you pass it an lvalue (a value that can be on the left-hand side of an assignment), then that value is going to be copied except maybe in circumstances where the compiler is able to determine that it can optimize the copy away (e.g. when the variable is never used after that function call), but that's going to depend on the compiler and compiler flags used. So, passing a variable to a function by value will usually result in a copy. However, if you pass the function an rvalue (the values that can't go on the left-hand side of an assignment), then it will move that object rather than copying it. This is different from C++, where move semantics were not introduced until C++11, and even then, they require move constructors, whereas D uses postblit constructors, which changes it so that moves can be done by default. A couple of previous SO questions on that:
Does D have something akin to C++0x's move semantics?
Questions about postblit and move semantics
So, yes, there are cases in D where passing by ref
would avoid a copy, but in D, ref
always requires an lvalue (even with const
). So, if you start putting ref const(T)
everywhere like you'd do const T&
in C++, you're going to have a lot of functions which are really annoying to call, because every temporary will have to be assigned to a variable first to call the function. So, you should seriously consider only ever using ref
for when you want to mutate a variable that's passed in and not for efficiency. Certainly, your default should be to not pass by const ref
, but if you do need that extra efficiency, you have two options:
ref
-ness so that you have an overload that takes by const ref
and one that takes by ref
so that the lvalues get passed to one without being copied, and the rvalues get passed to the other without needing an extraneous variable. e.g. auto foo(const Bar bar) { foo(bar); }
auto foo(ref const(Bar) bar) { ... }
And that's a bit annoying but works well enough when you only have one parameter with ref
. However, you get a combinatorial explosion of overloads as more ref
parameters are added. e.g.
auto foo(const Bar bar, const Glop glop) { foo(bar, glop); }
auto foo(ref const(Bar) bar, const Glop glop) { foo(bar, glop); }
auto foo(const Bar bar, ref const(Glop) glop) { foo(bar, glop); }
auto foo(ref const(Bar) bar, ref const(Glop) glop) { ... }
So, that works to a point, but it's not particularly pleasant. And if you define the overloads like I did here, then it also has the downside that the rvalues end up being passed to a wrapper function (adding an extra function call - though one that should be quite inlinable), which means that they're now passed by ref
to the main overload and if one of those parameters is passed to another function or returned, the compiler can't do a move, whereas if ref
hadn't been involved, then it could have. That's one of the reasons that it's now argued that you shouldn't use const T&
heavily in C++11 like you would have done in C++98.
You can get around that problem by duplicating the function body for each overload, but that obviously creates a maintenance problem as well as creating code bloat.
auto ref
, which basically does that for you, but the function has to be templated. e.g. auto foo()(const auto ref Bar bar, const auto ref Glop glop) { ... }
So, now you only have one overload, but it still generates all of those overloads with the full code underneath the hood every time the template is instantiated with a different combination of ref
-ness. So, your code is cleaner, but you still get more bloat, and if you need to do this with a virtual function, then you're out of luck and have to go back to the more explicit overload solution, because templated functions can't be virtual.
So, in general, trying to have your functions accept const ref
for efficiency reasons just gets ugly. The fact that D has move semantics built in reduces the need for it (just like with C++11, it's now argued that passing by value is often better, thanks to move semantics and how the compiler optimizes them). And it's ugly enough to do in D in the general case that unless you actually get a performance boost that matters, it's probably not worth passing by ref
just for efficiency. You should probably avoid using ref
for efficiency unless you've actually measured a difference in performance that's worth the pain.
The other thing to consider - separate from ref
-ness - is that D's const
is a lot more restrictive than C++'s const
(e.g. casting away const
and mutating is undefined behavior in D, and D's const
is transitive). So, slapping const
all over the place can sometimes become problematic - especially in generic code. So, using it can be great for preventing accidental mutation or indicating that a function does not mutate its arguments, but don't just blithely slap it everywhere that shouldn't be mutating the variable like you would in C++. Use it where it makes sense, but be aware that you will run into cases where D's const
is too restrictive to be usable, even if C++'s const
would have worked.
So, in most cases, when you want your function to take a T
, you should default to it taking a plain T
. And then if you know that efficiency is a concern, you can consider using some form of ref
(probably favoring auto ref
or const auto ref
if you're not dealing with a virtual function). But default to not using ref
. Your life will be much more pleasant that way.