c++optimization

can the compiler inline methods that generate objects within a loop?


Just a question for my own curiosity. I have heard many times that it's best to use the copy/destroy paradigm when writing a method. So if you have a method like this:

OtherClass MyClass::getObject(){
    OtherClass returnedObject;
    return returnedObject;
}

supposedly the compiler will optimize this by essentially inlining the method and generating the class on the stack of the method that calls getObject. I'm wondering how would that work in a loop such as this

for(int i=0; i<10; i++){
    list.push_back(myClass.getObject());
}

would the compiler put 10 instances of OtherClass on the stack so it could inline this method and avoid the copy and destroy that would happen in unoptimized code? What about code like this:

while(!isDone){
     list.push_back(myClass.getObject());
    //other logic which decides rather or not to set isDone
}

In this case the compiler couldn't possible know how many times getObject will be called so presumable it can pre-allocate anything to the stack, so my assumption is no inlining is done and every time the method is called I will pay the full cost of copying OtherObject?

I realize that all compilers are different, and that this depends on rather the compiler believes this code is optimal. I'm speaking only in general terms, how will most compiles be most likely to respond? I'm curious how this sort of optimization is done.


Solution

  • for(int i=0; i<10; i++){
        list.push_back(myClass.getObject());
    }
    

    would the compiler put 10 instances of OtherClass on the stack so it could inline this method and avoid the copy and destroy that would happen in unoptimized code?

    It doesn't need to put 10 instances on the stack just to avoid the copy and destroy... if there's space for one object to be returned with or without Return Value Optimisation, that it can reuse that space 10 times - each time copying from that same stack space to some new heap-allocated memory by the list push_back.

    It would even be within the compilers rights to allocate new memory and arrange for myClass.getObject() to construct the objects directly in that memory.

    Further, iff the optimiser chooses to unroll the loop, it could potentially call myClass.getObject() 10 times - even with some overlap or parallelism - IF it can somehow convince itself that that produces the same overall result. In that situation, it would indeed need space for 10 return objects, and again it's up to the compiler whether that's on the stack or through some miraculously clever optimisation, directly in the heap memory.

    In practice, I would expect compilers to need to copy from stack to heap - I doubt very much any mainstream compiler's clever enough to arrange direct construction in the heap memory. Loop unrolling and RVO are common optimisations though. But, even if both kick in, I'd expect each call to getObject to serially construct a result on the stack which is then copied to heap.

    If you want to "know", write some code to test for your own compiler. You can have the constructor write out the "this" pointer value.

    What about code like this:

    while(!isDone){
         list.push_back(myClass.getObject());
        //other logic which decides rather or not to set isDone
    }
    

    The more complex and less idiomatic the code is, the less likely the compiler writers have been able and bothered to optimise for it. Here, you're not even showing us a complexity level we can speculate on. Try it for your compiler and optimisation settings and see....