I want to create various small memoryviews in function temporarily and release them when the function is finished. Seems initialize the memoryview by malloc
is better, based on the comparisons performed in this answer, and actually is faster. As only part of the elements in each of these small memoryviews are assinged, what's the safe way to check whether it is assigned?
Here is the code and comparisons using malloc
and numpy's ndarray
:
## test_malloc.pyx
cimport cython
import numpy as np
cimport numpy as np
from libc.stdlib cimport malloc, free
np.import_array()
@cython.wraparound(False)
@cython.boundscheck(False)
cdef int _num_assign_malloc():
cdef:
Py_ssize_t n = 50
double *x = <double *> malloc(n * sizeof(double))
Py_ssize_t i
int k = 0
x[5] = 1.
x[8] = 1.
x[38] = 1.
for i in range(n):
if x[i] == 1.:
k += 1
free(x)
return k
@cython.wraparound(False)
@cython.boundscheck(False)
cdef int _num_assign_ndarray():
cdef:
Py_ssize_t n = 50
double[::1] x = np.zeros(n, dtype=np.float64)
Py_ssize_t i
int k = 0
x[5] = 1.
x[8] = 1.
x[38] = 1.
for i in range(n):
if x[i] == 1.:
k += 1
return k
cpdef test_num_malloc():
return _num_assign_malloc()
cpdef test_num_ndarray():
return _num_assign_ndarray()
timeit
:
>>> import timeit
>>> t = timeit.Timer("from test_malloc import test_num_malloc; test_num_malloc()")
>>> t.timeit(10000)
# 0.0111
>>> t = timeit.Timer("from test_malloc import test_num_ndarray; test_num_ndarray()")
>>> t.timeit(10000)
# 0.0246
When I check what are those unassigned elements in malloc
ed memoryview:
@cython.wraparound(False)
@cython.boundscheck(False)
cdef _malloc_unassign():
cdef:
Py_ssize_t n = 20
double *x = <double *> malloc(n * sizeof(double))
double[::1] y = np.zeros(n, dtype=np.float64)
Py_ssize_t i
x[5] = 1.
x[8] = 1.
for i in range(n):
y[i] = x[i]
free(x)
return y
cpdef test_malloc():
return _malloc_unassign()
I got:
>>> from test_malloc import test_malloc
>>> for t in test_malloc():
... print(t)
...
6.230420704259778e-307
3.5604305343967845e-307
1.6021930623879718e-306
2.447635570273665e-307
1.6911933005114767e-306
1.0
1.0570034520433751e-307
1.2461038328174693e-306
1.0
8.066321387750432e-308
1.201607109425611e-306
1.691193300407861e-306
1.2906222865963489e-306
1.4241722150642528e-306
1.3351264799003858e-306
7.56596412080425e-307
7.565984490202416e-307
1.6021903463576915e-306
1.424109743093715e-306
2.2250738585072396e-306
Seems I can simply cheched whether it is assigned by x[i] == 1.
, but I don't think this is good enough, especially according to this answer, those unassigned elements are treated as random; and if I used int
instead of double
.
Edit
Thanks @DavidW for providing calloc
.
Adding this to my test_malloc.pyx
:
@cython.boundscheck(False)
@cython.wraparound(False)
cdef int _num_assign_calloc():
cdef:
Py_ssize_t n = 50
double *x = <double *> calloc(n, sizeof(double))
Py_ssize_t i
int k = 0
x[5] = 1.
x[8] = 1.
x[38] = 1.
for i in range(n):
if x[i] == 1.:
k += 1
free(x)
return k
timeit
:
>>> import numpy as np
>>> time_malloc = []
>>> for i in range(1000):
... t = timeit.Timer("from core.test_malloc import test_num_malloc; test_num_malloc()")
... time_malloc.append(t.timeit(10000))
...
...
>>> time_ndarr = []
>>> for i in range(1000):
... t = timeit.Timer("from core.test_malloc import test_num_ndarray; test_num_ndarray()")
... time_ndarr.append(t.timeit(10000))
...
...
>>> time_calloc = []
>>> for i in range(1000):
... t = timeit.Timer("from core.test_malloc import test_num_calloc; test_num_calloc()")
... time_calloc.append(t.timeit(10000))
...
...
>>> print("mean: %.4f, std: %.4f" % (np.mean(time_malloc), np.std(time_malloc)))
mean: 0.0098, std: 0.0034
>>> print("mean: %.4f, std: %.4f" % (np.mean(time_calloc), np.std(time_calloc)))
mean: 0.0098, std: 0.0026
>>> print("mean: %.4f, std: %.4f" % (np.mean(time_ndarr), np.std(time_ndarr)))
mean: 0.0177, std: 0.0033
No. You cannot reliably work out if an element a malloced array has been assigned. That would require the C compiler to add a whole bunch of extra code and storage to track what has been written to.
Try calloc
, which does produce a zeroed array.
It's worth emphasising that one of the reasons malloc
is fast is because it skips the initialization so you may be after an impossible "best of both worlds" - all the advantages of initialization without any of the time cost. With that said, I'd expect calloc
to beat Numpy.