pythonpython-3.xnumpyforeachlinspace

Replace items in list with another items (small lists) depending on several conditions


I have a list of numbers and I want to replace each number with a list binary pattern depending on several conditions. I have a working code for doing so, but I am wondering if there is a faster more efficient one because if I wanted to add more conditions.

Thanks

import numpy as np
n = []
z = np.linspace(0,5,8)
t = [3.8856, 4.1820, 2.3040, 1.0197, 0.4295, 1.5178, 0.3853, 4.2848, 4.30911, 3.2299, 1.8528, 0.6553, 3.3305, 4.1504, 1.8787]
for i in t:
    if i>=z[0] and i<z[1]:
        n.extend([0,0,0,0,0])
    elif i>=z[1] and i<z[2]:
        n.extend([0,0,0,0,1])
    elif i>=z[2] and i<z[3]:
        n.extend([0,0,0,1,0])
    elif i>=z[3] and i<z[4]:
        n.extend([0,0,0,1,1])
    elif i>=z[4] and i<z[5]:
        n.extend([0,0,1,0,0])
    elif i>=z[5] and i<z[6]:
        n.extend([0,0,1,0,1])
    elif i>=z[6] and i<z[7]:
        n.extend([0,0,1,1,0])
new_n = np.asarray(n).reshape(len(t),5) # new_n is the final pattern I want.

Solution

  • This is not an answer per se, but it probably will be faster due to using numpy rather than python's for loop.

    First, you want to perform some binning:

    >> bins = np.digitize(t, z) - 1 # minus 1 just to align our shapes
    array([5, 5, 3, 1, 0, 2, 0, 5, 6, 4, 2, 0, 4, 5, 2])
    

    This tells you in what bin each of your values goes. Next, define your patterns, in order:

    >> patterns = np.array([
        [0,0,0,0,0],
        [0,0,0,0,1],
        [0,0,0,1,0],
        [0,0,0,1,1],
        [0,0,1,0,0],
        [0,0,1,0,1],
        [0,0,1,1,0],
    ])
    

    Now for some numpy magic, instead of appending/extending, create an array full of zeros (this should be almost always faster). This array will have shape (len(t), len(z)-1). Using this SO answer, we will also do one-hot encoding:

    >> inds = np.zeros((len(t), len(z)-1))
    >> inds[np.arange(len(t)), bins] = 1
    >> inds
    array([[0., 0., 0., 0., 0., 1., 0.],
           [0., 0., 0., 0., 0., 1., 0.],
           [0., 0., 0., 1., 0., 0., 0.],
           .....,
           [0., 0., 0., 0., 0., 1., 0.],
           [0., 0., 1., 0., 0., 0., 0.]])
    

    Finally, all we need to is a matrix multiplication

    >> inds @ patterns
    array([[0., 0., 1., 0., 1.],
           [0., 0., 1., 0., 1.],
           [0., 0., 0., 1., 1.],
           ....
           [0., 0., 1., 0., 1.],
           [0., 0., 0., 1., 0.]])
    

    I did not perform a quality timing test, but from my minor experimentation here are my results:

    Your loop: 17.7 µs ± 160 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) My implementation: 8.49 µs ± 125 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

    Which may or may not scale well to larger datasets. Hope this helps :)


    Edit: Following Alexander Lopatin's answer I was interested to see that my method was significantly slower. Upon further investigation one of the conclusions I arrived at was that numpy's functions have some significant overhead which is not a cheap price to pay for few values of t. For larger lists, the numpy overhead is insignificant, but the performance gain is not:

    enter image description here

    timings = {
        10: [7.79, 24.1, 21.7],
        16: [10.7, 29.9, 22.9],
        24: [14.6, 40.5, 23.4],
        33: [19.1, 48.6, 23.4],
        38: [21.9, 55.9, 23.9],
        47: [26.7, 66.2, 24.1],
        61: [33, 79.5, 24.7],
        75: [40.8, 92.6, 25.8],
        89: [47.6, 108, 26.2],
        118: [60.1, 136, 27.4],
        236: [118, 264, 33.1],
        472: [236, 495, 40.9],
        1000: [657, 922, 52],
        10000: [6530, 9090, 329]
    }
    

    Zoom:

    enter image description here