In C++ we can ensure foo
is called when we exit a scope by putting foo()
in the destructor of a local object. That's what I think of when I head "scope guard." There are plenty of generic implementations.
I'm wondering—just for fun—if it's possible to achieve the behavior of a scope guard with zero overhead compared to just writing foo()
at every exit point.
Zero overhead, I think:
{
try {
do_something();
} catch (...) {
foo();
throw;
}
foo();
}
Overhead of at least 1 byte to give the scope guard an address:
{
scope_guard<foo> sg;
do_something();
}
Do compilers optimize away giving sg
an address?
A slightly more complicated case:
{
Bar bar;
try {
do_something();
} catch (...) {
foo(bar);
throw;
}
foo(bar);
}
versus
{
Bar bar;
scope_guard<[&]{foo(bar);}> sg;
do_something();
}
The lifetime of bar
entirely contains the lifetime of sg
and its held lambda (destructors are called in reverse order) but the lambda held by sg
still has to hold a reference to bar
. I mean for example int x; auto l = [&]{return x;};
gives sizeof(l) == 8
on my 64-bit system.
Is there maybe some template metaprogramming magic that achieve the scope_guard
sugar without any overhead?
If by overhead you mean how much space is occupied by scope-guard variable then zero overhead is possible if functional object is compile-time value. I've coded small snippet to illustrate this:
#include <iostream>
template <auto F>
class ScopeGuard {
public:
~ScopeGuard() { F(); }
};
void Cleanup() {
std::cout << "Cleanup func..." << std::endl;
}
int main() {
{
char a = 0;
ScopeGuard<&Cleanup> sg;
char b = 0;
std::cout << "Stack difference "
<< int(&a - &b - sizeof(char)) << std::endl;
}
{
auto constexpr f = []{
std::cout << "Cleanup lambda..." << std::endl; };
char a = 0;
ScopeGuard<f> sg;
char b = 0;
std::cout << "Stack difference "
<< int(&a - &b - sizeof(char)) << std::endl;
}
}
Output:
Stack difference 0
Cleanup func...
Stack difference 0
Cleanup lambda...
Code above doesn't create even a single byte on a stack, because any class variable that has no fields occupies on stack 0 bytes, this is one of obvious optimizations that is done by any compiler. Of course unless you take a pointer to such object then compiler is obliged to create 1-byte memory object. But in your case you don't take address to scoped guard.
You can see that there is not a single byte occupied by looking at Try it online!
link above the code, it shows assembler output of CLang.
To have no fields at all scoped guard class should only use compile-time function object, like global function pointer of lambda without capture. This two kinds of objects are used in my code above.
In code above you can even see that I outputted stack difference of char variable before and after scoped guard variable to show that scoped guard actually occupies 0 bytes.
Lets go a bit further and make possibility to have non-compile-time values of functional objects.
For this again we create class with no fields, but now store all functional objects inside one shared vector with thread local storage.
Again as we have no fields in class and don't take any pointer to scoped guard object then compiler doesn't create not a single byte for scoped guard object on stack.
But instead single shared vector is allocated in heap. This way you can trade stack storage for heap storage if you're out of stack memory.
Also having shared vector will allow us to use as few memory as possible, because vector uses only as much memory as many there are nested blocks that use scoped guard. If all scoped guards are located sequentially in different blocks then vector will have just 1 element inside so using just few bytes of memory for all scoped guards that were used.
Why heap memory of shared vector is more economical memory-wise than stack-stored memory of scoped guard. Because in case of stack memory if you have several sequential blocks of guards:
void test() {
{
ScopeGuard sg(f0);
}
{
ScopeGuard sg(f1);
}
{
ScopeGuard sg(f2);
}
}
then all 3 guards occupy tripple amount of memory on stack, because for each function like test()
above compiler allocates stack memory for all used in function's variables, so for 3 guards it allocates tripple amount.
In case of shared vector test()
function above will use just 1 vector's element, so vector will have size of 1 at most hence will use just single amount of memory to store functional object.
Hence if you have many non-nested scoped guards inside one function then shared vector will be much more economical.
Now below I present code snippet for shared-vector approach with zero fields and zero stack memory overhead. To remind, this approach allows to use non-compile-time functional objects unlike solution in part one of my answer.
#include <iostream>
#include <vector>
#include <functional>
class ScopeGuard2 {
public:
static auto & Funcs() {
thread_local std::vector<std::function<void()>> funcs_;
return funcs_;
}
ScopeGuard2(std::function<void()> f) {
Funcs().emplace_back(std::move(f));
}
~ScopeGuard2() {
Funcs().at(Funcs().size() - 1)();
Funcs().pop_back();
}
};
void Cleanup() {
std::cout << "Cleanup func..." << std::endl;
}
int main() {
{
ScopeGuard2 sg(&Cleanup);
}
{
auto volatile x = 123;
auto const f = [&]{
std::cout << "Cleanup lambda... x = "
<< x << std::endl;
};
ScopeGuard2 sg(f);
}
}
Output:
Cleanup func...
Cleanup lambda... x = 123