I'm migrating from python2 to python3 and I'm facing an issue which I have simplified to this:
import numpy as np
a = np.array([1, 2, None])
(a > 0).nonzero()
Traceback (most recent call last):
File "<input>", line 1, in <module>
TypeError: '>' not supported between instances of 'NoneType' and 'int'
In reality I'm processing np-arrays with millions of data and really need to keep the np-operation for performance. In python 2 this was working fine and returns what I expect, since python2 is not so keen on types. What is the best approach for migrating this?
One way to achieve the desired result is to use a lambda function with np.vectorize
:
>>> a = np.array([1, 2, None, 4, -1])
>>> f = np.vectorize(lambda t: t and t>0)
>>> np.where(f(a))
(array([0, 1, 3], dtype=int64),)
Of course, if the array doesn't contain negative integers, you could just use np.where(a)
, as both None
and 0
would evaluate to False
:
>>> a = np.array([1, 2, None, 4, 0])
>>> np.where(a)
(array([0, 1, 3], dtype=int64),)
Another way this can be solved is by first converting the array to use the float dtype, which has the effect of converting None
to np.nan
. Then np.where(a>0)
can be used as normal.
>>> a = np.array([1, 2, None, 4, -1])
>>> np.where(a.astype(float) > 0)
(array([0, 1, 3], dtype=int64),)
Time comparison:
So Bob's approach, while not as easy on the eyes, is about twice as fast as the np.vectorise
approach, and slightly slower than the float conversion approach.
Code to reproduce:
import perfplot
import numpy as np
f = np.vectorize(lambda t: t and t>0)
choices = list(range(-10,11)) + [None]
def cdjb(arr):
return np.where(f(arr))
def cdjb2(arr):
return np.where(arr.astype(float) > 0)
def Bob(arr):
deep_copy = np.copy(arr)
deep_copy[deep_copy == None] = 0
return (deep_copy > 0).nonzero()[0]
perfplot.show(
setup=lambda n: np.random.choice(choices, size=n),
n_range=[2**k for k in range(25)],
kernels=[
cdjb, cdjb2, Bob
],
xlabel='len(a)',
)