I'm writing a C extension to my Python program for speed purposes, and running into some very strange behaviour trying to pass in a 3-dimensional numpy array. It works with a 2-dimensional array, but I'm sure I'm screwing something up with the pointers trying to get it to work with the 3rd dimension. But here's the weird part. If I just pass in a 3-D array, it crashes with a Bus Error. If (in Python) I create my variable as a 2D array first, and then overwrite it with a 3D array, it works perfectly. If the variable is an empty array first and then a 3D array, it crashes with a Seg Fault. How can that possibly happen?
Also, can anyone help me get a 3D array working? Or should I just give up and pass in a 2D array and reshape it myself?
Here's my C code:
static PyObject* func(PyObject* self, PyObject* args) {
PyObject *list2_obj;
PyObject *list3_obj;
if (!PyArg_ParseTuple(args, "OO", &list2_obj, &list3_obj))
return NULL;
double **list2;
double ***list3;
//Create C arrays from numpy objects:
int typenum = NPY_DOUBLE;
PyArray_Descr *descr;
descr = PyArray_DescrFromType(typenum);
npy_intp dims[3];
if (PyArray_AsCArray(&list2_obj, (void **)&list2, dims, 2, descr) < 0 || PyArray_AsCArray(&list3_obj, (void ***)&list3, dims, 3, descr) < 0) {
PyErr_SetString(PyExc_TypeError, "error converting to c array");
return NULL;
}
printf("2D: %f, 3D: %f.\n", list2[3][1], list3[1][0][2]);
}
And here's my Python code that calls the above function:
import cmod, numpy
l2 = numpy.array([[1.0,2.0,3.0], [4.0,5.0,6.0], [7.0,8.0,9.0], [3.0, 5.0, 0.0]])
l3 = numpy.array([[2,7, 1], [6, 3, 9], [1, 10, 13], [4, 2, 6]]) # Line A
l3 = numpy.array([]) # Line B
l3 = numpy.array([[[2,7, 1, 11], [6, 3, 9, 12]],
[[1, 10, 13, 15], [4, 2, 6, 2]]])
cmod.func(l2, l3)
So, if I comment out both Line A and B, it crashes with a Bus error. If Line A is there, but Line B is commented out, it runs correctly with no errors. If Line B is there but Line A is commented out, it prints the correct numbers but then Seg faults. Finally if both lines are present it also prints the correct numbers and then Seg faults. What in the hell is going on here?
EDIT: Ok. Wow. So I was using int
in Python but calling them double
in C. And that was working fine with 1D and 2D arrays. But not 3D. So I changed the Python definition of l3 to be floats, and now it all works fantastically (Thank you very much Bi Rico).
But now, more strange behaviour with Lines A & B! Now if both Lines are commented out, the program works. If Line B is present but A is commented out, it works, and ditto if both are uncommented. But if Line A is present and B is commented out, I get that fantastic Bus error again. I'd really like to avoid these in the future, so does anyone have any clue why the declaration of a Python variable can have this kind of impact?
EDIT 2: Well, as insane as these errors are, they're all due to the 3-dimensional numpy array I pass in. If I only pass in 1- or 2-D arrays, it behaves as expected, and manipulation of the other Python variables does nothing. This leads me to believe that the problem lies somewhere in Python's reference counting. In the C-code the reference count is decreased more than it should for the 3-D arrays, and when that function returns Python tries to clean up objects, and attempts to delete a NULL pointer. This is just my guess, and I've tried to Py_INCREF();
everything I could think of to no avail. I guess I'll just be using a 2D array and reshaping it in C.
I already mentioned this in a comment, but I hope flushing it out a little helps make it more clear.
When you're working with numpy arrays in C it's good to be explicit about the typing of your arrays. Specifically it looks like you're declaring your pointers as double ***list3
, but they way you're creating l3
in your python code you'll get an array with dtype npy_intp
(I think). You can fix this by explicitly using the dtype when creating your arrays.
import cmod, numpy
l2 = numpy.array([[1.0,2.0,3.0],
[4.0,5.0,6.0],
[7.0,8.0,9.0],
[3.0, 5.0, 0.0]], dtype="double")
l3 = numpy.array([[[2,7, 1, 11], [6, 3, 9, 12]],
[[1, 10, 13, 15], [4, 2, 6, 2]]], dtype="double")
cmod.func(l2, l3)
Another note, because of the way python works it's nearly impossible for "line A" and "line B" to have any effect on the C code what so ever. I know that this seems to conflict with your empirical experience, but I'm pretty sure on this point.
I'm a little less sure about this, but based on my experience with C, bus-errors and segfaults are not deterministic. They depend on memory allocation, alignment, and addresses. In some situation code seems to run fine 10 times, and fails on the 11th run even though nothing has changed.
Have you considered using cython? I know it's not an option for everyone, but if it is an option you could get nearly C level speedups using typed memoryviews.