pythonnumpytimereduction

Speeding up triple loop


Initially I had the loop

import numpy

datos = numpy.random.rand(1000,17)
clusters = 250
n_variables = 17
centros = numpy.random.rand(clusters,n_variables)
desviaciones = numpy.random.rand(n_variables)
W=1
H=numpy.zeros((len(datos), clusters))
Weight = 1 / n_variables
for k in range(len(datos)):
    inpt = datos[k]
    for i in range(clusters):
        for j in range(n_variables):
            sup = centros[i][j] + W * desviaciones[j]
            inf = centros[i][j] - W * desviaciones[j]
            feat = np.array(inpt[j])
            if (feat < sup and feat > inf):
                H[k, i] += Weight

but a triple loop can slow the process a lot. Then, I could reduce it to:

import numpy

datos = numpy.random.rand(1000,17)
clusters = 250
n_variables = 17
centros = numpy.random.rand(clusters,n_variables)
desviaciones = numpy.random.rand(n_variables)
W=1
H=numpy.zeros((len(datos), clusters))
sup = centros + W*desviaciones
inf = centros - W*desviaciones
Weight = 1 / n_variables
for k in range(len(datos)):
    inpt = datos[k]
    for i in range(clusters):
        suma = (sup[i]>inpt)&(inf[i]<inpt)
        H[k,i]=suma.sum()*Weight

so I could save a loop, but I have problems trying to reduce the others loop using numpy functions. The only left is to repeat the formula of 'suma' for each row of sup and datos. Do you know any way of doing it?


Solution

  • You can reshape centros and datos to three dimensions to take advantage of broadcasting:

    centros = centros[None, :, :]    # (   1, 250, 17)  
    datos = datos[:, None, :]        # (1000,   1, 17)
    desv = W * desviaciones
    sup = centros + desv
    inf = centros - desv
    H = Weight * ((datos < sup) & (datos > inf)).sum(axis=2)