Why does resizing vectors in a vector of pointers seem faster than in a vector of vectors?

I'm comparing performance when resizing inner vectors in two cases: a std::vector<std::vector> versus a std::vector<std::vector*>. The second seems faster, and I'm trying to understand why.

int main (int argc, char* argv []) {
  const int n = 10;
  const int s2 (10000);
  {
    std::vector<int> tab [n];
    clock_t begt, endt;
    begt = clock ();
    {
      std::vector<int>* pv ((std::vector<int>*) tab);
      for (int i (0); i < n; ++i, ++pv) {
        pv->resize (s2);
      }
    }
    endt = clock ();
    std::cout << "vector resize duration == " << (unsigned int) (endt - begt) <<  " ticks" << std::endl;
  }

  {
    std::vector<int> *tab [n];
    for (int i (0); i != n; ++i) {
      tab [i] = new std::vector<int> ();
    }
    clock_t begt, endt;
    begt = clock ();
    {
      std::vector<int>** pv ((std::vector<int>**) tab);
      for (int i (0); i < n; ++i, ++pv) {
        (*pv)->resize (s2);
      }
    }
    for (int i (0); i != n; ++i) {
      delete tab [i];
    }
    endt = clock ();
    std::cout << "*vector resize duration== " << (unsigned int) (endt - begt) <<  " ticks" << std::endl;
  }
  return 0;
}

and typical results :

vector resize duration == 114 ticks
*vector resize duration== 78 ticks

Solution

I got curious about the claimed observations, and was able to reproduce them myself. Then I swapped the two alternative implementations, and now the tables were turned. The std::vector<int> *tab [n] version, which now executes first, was now remarkably slower than the other one, which now executes second.

The explanation is very simple. The C++ library reuses allocated memory. When you use memory, either directly yourself, via new, or indirectly, by allocating memory for values in containers, and then later release it, the C++ library does not release the freed memory back to the operating system, but instead reuses it the next time it needs more memory, without having to ask your operating system for more. There is no law that says that this is what the C++ library must do, but this is a typical behavior of modern C++ implementations.

So, what this is, is very simple. Your first benchmark includes time required to allocate a bunch of memory from the operating system. The C++ needed memory for those vectors, the first time, and that took the additional time. Your second benchmark section of code did not have to do that, because that memory was already allocated, and then freed, by the first benchmark; so the second benchmark simply resulted in your C++ library reusing the memory it already had, but was unused (this was obvious by looking at the syscall trace).

Typical results from simply repeating the first benchmark a second time:

vector resize duration == 191 ticks
vector resize duration == 48 ticks
*vector resize duration== 63 ticks