c++memory

If I initialize a struct that contains a pointer to itself on stack and then return it, does the pointer point to unmanaged memory?


Consider following:

  1. I have some sort of big class A. I want to extract part of its state / behaviour to class B and store it as member variable.
  2. But in order to perform its function class B requires information from class A, that is: instance of class A owns instance of class B, instance of class B is associated with instance of class A. I want to implement this as instance of class B holding a pointer to instance of class A, but not managing it.

Also consider following context:

  1. I am novice to C++, but I have some experience working with garbage collected languages.
  2. I do not have intuition yet when should I allocate. I understand that containers, self referential data structures and polymorphic classes in general should be heap allocated, but I do not know conventions yet. std::vector allocates it's memory by itself, but I think domain classes should not manage it's own memory?

Some example code, compiled by Visual Studio 2022 using C++20 Standard on 64 bit platform:

// self-referential-struct.cpp
#include <string>
#include <iostream>

namespace self_ref_struct {
  struct Manager;

  struct Employee {
    std::string name{};
    Manager* manager{ nullptr };

    void greet();
  };

  struct Manager {
    std::string name{};
    Employee employee{};

    Manager(std::string name, std::string employee_name) : name{ name } {
      employee = Employee{ .name{employee_name}, .manager{this} };
    }

    void greet();
  };

  void Employee::greet() {
    std::cout << "Hello, I am employee " << name << " and the address of my manager is " << (void*)manager << ". His name is " << manager->name << ".\n";
  }

  void Manager::greet() {
    std::cout << "Hello, I am manager " << name << ". My address is: " << (void*)this << ". Employee, greet the user!\n";
    employee.greet();
  }

  Manager foo() {
    Manager mgr("Tom", "Adam"); // #1
    mgr.greet();

    return mgr; // #2
  }


  int _main() {
    auto mgr = foo();
    std::cout << "From _main\n";
    mgr.name = "Peter";
    mgr.greet();

    return 0;
  }
}

// main.cpp

namespace self_ref_struct {
  int _main();
}

int main() {
  self_ref_struct::_main();

  return 0;
}

// example output
/*
Hello, I am manager Tom. My address is: 000000D4FAEFF9D0. Employee, greet the user!
Hello, I am employee Adam and the address of my manager is 000000D4FAEFF9D0. His name is Tom.
From _main
Hello, I am manager Peter. My address is: 000000D4FAEFF9D0. Employee, greet the user!
Hello, I am employee Adam and the address of my manager is 000000D4FAEFF9D0. His name is Peter.
*/

My intuition tells me that something like this should happen:
At line #1 program creates a variable A1 on stack and assigns it a pointer to itself.
At line #2 temporary variable A2 is created by copy constructor, that holds a pointer to variable A1. Afterwards function returns and stack is popped, invalidating A1 variable.
At line #3 variable A3 is assigned a value from temporary variable A2, again by copy constructor. It still holds the pointer to now invalid variable A1.

I would expect this program to at best raise some error on trying to dereference a pointer to A1, at worst output some garbage data. I could understand if A3 pointed to not-yet-cleaned A1 variable. But it just works. Why?

I am of course interested in what is the idiomatic way to achieve something like this, but I think I should ask that in another question once I have more context.


Solution

  • The way Manager is implemented now, this function relies on NRVO (named return value optimization):

    Manager foo() {
        Manager mgr("Tom", "Adam"); // #1
        mgr.greet();
    
        return mgr;                 // #2 copy/move _may_ be elided
    }
    

    NRVO is non-binding so mgr may or may not be created directly in the variable receiving the return value. If you turn off this type of copy/move elision (gcc: -fno-elide-constructors, MSVC: /Zc:nrvo-) you will get a valid interpretation of the program in which the receiving variable will contain a dangling pointer.

    You need to implement proper copy/move semantics (The rule of three/five/zero) to handle the cases where it's copied or moved.

    But it just works. Why?

    By default, MSVC elides the copy/move in return mgr; as it's allowed to do. Turn this optimization off and your program will have undefined behavior instead of implementation defined behavior.

    Demo