How to make C++ be as economical with memory as C, while still using STL?

These C and C++ programs are equivalent (as I'm using -fno-exceptions, below):

"loop.c":

#include <stdlib.h>
#include <stdio.h>
#include <math.h>

const int N = 1024 * 4;

int main(void) {
    double* v = malloc(N * sizeof(double));
    if(!v) return 1;

    for(int i = 0; i < N; i++)
        v[i] = sin(i);

    double sum = 0;

    for(int i = 0; i < N; i++)
        for(int j = 0; j < N; j++)
            for(int k = 0; k < N; k++)
                sum += v[i] + 2 * v[j] + 3 * v[k];

    printf("sum = %f\n", sum);

    free(v);
}

"loop.cpp":

#include <vector>
#include <stdio.h>
#include <math.h>

const int N = 1024 * 4;

int main() {
    std::vector<double> v(N);

    for(int i = 0; i < N; i++)
        v[i] = sin(i);

    double sum = 0;

    for(int i = 0; i < N; i++)
        for(int j = 0; j < N; j++)
            for(int k = 0; k < N; k++)
                sum += v[i] + 2 * v[j] + 3 * v[k];

    printf("sum = %f\n", sum);
}

But when I run them on my system (compiled with gcc -O2 -lm and g++ -O2 -fno-exceptions), the C version uses 800K, while the C++ one uses 1600K, and sometimes this jumps to 3300K half-way through the execution -- I haven't noticed the C version do this (I'm looking at "RES" in top, while the program runs)

I wonder, if there is some compiler setting, or environment variable, or something else that could make the C++ version be as economical with memory as its C equivalent, while still using STL (std::vector, specifically)?

I hope such a setting exists, because one of the design principles of C++ is:

What you don’t use, you don’t pay for (in time or space) and further: What you do use, you couldn’t hand code any better.

To keep the question focused: I'm not interested in switching to alternative libstdc++ implementations, but I can use Clang instead of GCC, if that helps.

Update: I did some more experimenting:

"run.sh":

for i in `seq 1000`; do
    nice ./a.out &
done

followed by

$ killall a.out

and looked the overall memory usage changes. And it is indeed the case, as some commentators suggested, that the overhead is mostly shared by processes, although it still exists, per-process also: The per process memory usage is 200K and 250K for C and C++ versions, respectively.

On the other hand, replacing vector by new and delete or by malloc and free has little or no effect. The overhead here comes from using C++, rather than vector.

Solution

After some more experimentation, I noticed that using -static-libstdc++ eliminates the differences in memory consumption. I think this answers the question as originally stated.

Just a small nit though: g++ -O2 -fno-exceptions -static-libstdc++ -s produces an executable that's 6x bigger in size than what gcc -O2 -s -lm produces. But there is a way to fix that: You can recompile libstdc++.a with these flags also (I used -fno-exceptions -fno-rtti). This makes the size of the binary very similar to the C version.