Can a simple multi-dimensional array dereference be slow?

I'm struggling to understand the output I'm getting from gprof.

I have a simple wrapper class around a 2D array that I need to be contiguous in memory.

Its constructor and the method I use to access values are:

Array2d::Array2d(int size, double initialValue)
: mSize(size) {
    data = new double *[size];
    data[0] = new double[size * size];

    for (int i = 1; i < size; ++i) {
        data[i] = data[0] + i * size;
    }

    for (int i = 0; i < size; ++i) {
        for (int j = 0; j < size; ++j) {
            data[i][j] = initialValue;
        }
    }
}


double &Array2d::operator()(int i, int j) {
    return data[i][j];
}

In the numerical code I'm working on, this is one output I obtained from gprof:

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 49.33     34.80    34.80 43507867293     0.00     0.00  Array2d::operator()(int, int)
 18.05     47.53    12.73                             jacobi(Array2d&, Array2d&, int, int, double, double, double, int)

I'm surprised to see that almost 50% of the running time is spent accessing values from the array.

This Array2d class replaced the use of std::vector<double>, which was much faster.

What I am failing to understand here?

Solution

I'm surprised to see that almost 50% of the running time is spent accessing values from the array.

We cannot say much about this without seeing your code. It is easily possible to write code that has higher percentage of a single call. Consider

int main() { 
    while(true){ 
        foo(); 
    }
}

A profiler will tell you that close to 100% of the runtime is spend in foo. Does that mean foo is slow? No, we dont know.

The percentages you get from the profiler rather give you a hint to where the hot spots are in your code. If you know that 50% of the time is spend in one particular function call, then you know that this is a good canditate for improving performance. If you optimize this one function call you can achieve a speedup of up to x2 (cf amdahls law).

On the other hand, a function that uses only 0.1% of the total runtime can be made 1000-times faster without a significant impact on the total runtime.

Whether your element access is slow or fast, you can only know if you implement a second variant, leave everything else in the code as is and repeat the profiling. The variant that leads to a higher percentage performs worse.