Initially I had the loop
import numpy
datos = numpy.random.rand(1000,17)
clusters = 250
n_variables = 17
centros = numpy.random.rand(clusters,n_variables)
desviaciones = numpy.random.rand(n_variables)
W=1
H=numpy.zeros((len(datos), clusters))
Weight = 1 / n_variables
for k in range(len(datos)):
inpt = datos[k]
for i in range(clusters):
for j in range(n_variables):
sup = centros[i][j] + W * desviaciones[j]
inf = centros[i][j] - W * desviaciones[j]
feat = np.array(inpt[j])
if (feat < sup and feat > inf):
H[k, i] += Weight
but a triple loop can slow the process a lot. Then, I could reduce it to:
import numpy
datos = numpy.random.rand(1000,17)
clusters = 250
n_variables = 17
centros = numpy.random.rand(clusters,n_variables)
desviaciones = numpy.random.rand(n_variables)
W=1
H=numpy.zeros((len(datos), clusters))
sup = centros + W*desviaciones
inf = centros - W*desviaciones
Weight = 1 / n_variables
for k in range(len(datos)):
inpt = datos[k]
for i in range(clusters):
suma = (sup[i]>inpt)&(inf[i]<inpt)
H[k,i]=suma.sum()*Weight
so I could save a loop, but I have problems trying to reduce the others loop using numpy functions. The only left is to repeat the formula of 'suma' for each row of sup and datos. Do you know any way of doing it?
You can reshape centros
and datos
to three dimensions to take advantage of broadcasting:
centros = centros[None, :, :] # ( 1, 250, 17)
datos = datos[:, None, :] # (1000, 1, 17)
desv = W * desviaciones
sup = centros + desv
inf = centros - desv
H = Weight * ((datos < sup) & (datos > inf)).sum(axis=2)