I'm working on a variation of the Nelder-Mead algorithm from Numerical Recipes, that would allow the user to specify a maximum number of target function calls to be made.
From my main routine, here is how I call the amoeba()
function that implements the Nelder-Mead algorithm:
amoeba(p,y,params->ndim,params->tol,params->nmax,internal_funk,&nfunc);
But here is how it got implemented:
void amoeba(float **p, float y[], int ndim, unsigned nmax, float ftol, float (*funk)(float []), int *nfunk) {
....
}
Notice that I inverted the nmax
and the ftol
arguments in my function call.
Astonishingly, amoeba()
still works. Stepping through it in a debugger confirms that the right values were assigned to nmax
and ftol
.
My main routine #include
d a header file that defines the signature of the amoeba()
routine, and compiling the main routine yielded no errors. However, the amoeaba()
source file did not include that header (a mistake of mine), and so the compiler did not generate any errors either.
So how come my linked program still functions as it should, even though the arguments are not given in the right order?
UPDATE
@Binyamin Sharet, I'm showing here the assembly right before the call to amoeba
and in amoeba
. Does it support your hypothesis?
UPDATE 2
@Binyamin Sharet sure, here it is:
The reason is probably, because the floating points parameters are not passed on the stack, but on the co-processor stack, so the order of those two didn't matter.
For example, the function expects this order of arguments:
| p | |
| y | |
| ndim | |
| nmax | |
| funk | |
| nfunk | ftol |
+------------------------+-----------------------------+
| stack | coprocessor stack |
It didn't matter if you switch nmax
and ftol
, because the order on the stacks would be the same, and when the amoeba
tries to read them, it doesn't get confused for the same reason.
Edit
Reading the disassembly shows that I was off a bit, but that because of SSE, the instruction used for passing the float variable is movss
, which you can see in both assemblt listing you added, one time to xmm0
register (in the caller), and one time from xmm0
(in the callee). so you can replace the words coprocessor-stack with xmm registers and that's your situation.