I have two scenarios, in both i allocate 78*2 sizeof(int) of memory and initialize it to 0. Are there any differences regards performances?
Scenario A:
int ** v = calloc(2 , sizeof(int*));
for (i=0; i<2; ++i)
{
v[i] = calloc(78, sizeof(int));
}
Scenario B:
int ** v = calloc(78 , sizeof(int*));
for (i=0; i<78; ++i)
{
v[i] = calloc(2, sizeof(int));
}
I supposed that in performance terms, it's better to use a calloc if an initialize array is needed, let me know if I'm wrong
First, discussing optimization abstractly has some difficulties because compilers are becoming increasingly better at optimization. (For some reason, compiler developers will not stop improving them.) We do not always know what machine code given source code will produce, especially when we write source code today and expect it to be used for many years to come. Optimization may consolidate multiple steps into one or may omit unnecessary steps (such as clearing memory with calloc
instead of malloc
immediately before the memory is completely overwritten in a for
loop). There is a growing difference between what source code nominally says (“Do these specific steps in this specific order”) and what it technically says in the language abstraction (“Compute the same results as this source code in some optimized fashion”).
However, we can generally figure that writing source code without unnecessary steps is at least as good as writing source code with unnecessary steps. With that in mind, let’s consider the nominal steps in your scenarios.
In Scenario A, we tell the computer:
int *
, clear them, and put their address in v
.int
, clear them, and put their addresses in the preceding int *
.In Scenario B, we tell the computer:
int *
, clear them, and put their address in v
.int
, clear them, and put their addresses in the preceding int *
.We can easily see two things:
int *
and immediately fill it with other data. That is wasteful; there is no need to set memory to zero before setting it to something else. Just set it to something else. Use malloc
for this, not calloc
. malloc
takes just one parameter for the size instead of two that are multiplied, so replace calloc(2, sizeof (int *))
with malloc(2 * sizeof (int *))
. (Also, to tie the allocation to the pointer being assigned, use int **v = malloc(2 * sizeof *v);
instead of repeating the type separately.)However, both scenarios allude to another issue. Presumably, the so-called array will be used later in the program, likely in a form like v[i][j]
. Using this as a value means:
v
.i
elements beyond that.j
elements beyond that.int
at that location.Let’s consider a different way to define v
: int (*v)[78] = malloc(2 * sizeof *v);
.
This says:
int
and put their address in v
.Immediately we see that involves fewer steps than Scenario A or Scenario B. But also look at what it does to the steps for using v[i][j]
as a value. Because v
is a pointer to an array instead of a pointer to a pointer, the computer can calculate where the appropriate element is instead of having to load an address from memory:
v
.i
•78 elements beyond that.j
elements beyond that.int
at that location.So this pointer-to-array version is one step fewer than the pointer-to-pointer version.
Further, the pointer-to-pointer version requires an additional fetch from memory for each use of v[i][j]
. Fetches from memory can be expensive relative to in-processor operations like multiplying and adding, so it is a good step to eliminate. Having to fetch a pointer can prevent a processor from predicting where the next load from memory might be based on recent patterns of use. Additionally, the pointer-to-array version puts all the elements of the 2×78 array together in memory, which can benefit the cache performance. Processors are also designed for efficient use of consecutive memory. With the pointer-to-pointer version, the separate allocations typically wind up with at least some separation between the rows and may have a lot of separation, which can break the benefits of consecutive memory use.