I'm trying to accomplish Nvidia's "Fundamentals of Accelerated Computing with CUDA Python" course and have got a task to refactor a simple version of some code that performs work needed to create a hidden layer in a neural network:
import numpy as np
from numba import cuda, vectorize
n = 1000000
greyscales = np.floor(np.random.uniform(0, 255, n).astype(np.float32))
weights = np.random.normal(.5, .1, n).astype(np.float32)
from numpy import exp
def normalize(grayscales):
return grayscales / 255
def weigh(values, weights):
return values * weights
def activate(values):
return ( exp(values) - exp(-values) ) / ( exp(values) + exp(-values) )
def create_hidden_layer(n, greyscales, weights, exp, normalize, weigh, activate):
normalized = normalize(greyscales)
weighted = weigh(normalized, weights)
activated = activate(weighted)
return activated
arguments = {"n":n,
"greyscales": greyscales,
"weights": weights,
"exp": exp,
"normalize": normalize,
"weigh": weigh,
"activate": activate}
a = create_hidden_layer(**arguments)
print(a)
I have transformed the code a little bit and after modifications, it looks like this:
from math import exp
@vectorize(['float32(float32)'],target='cuda')
def normalize(grayscales):
return grayscales / 255
@vectorize(['float32(float32,float32)'],target='cuda')
def weigh(values, weights):
return values * weights
@vectorize(['float32(float32)'],target='cuda')
def activate(values):
return ( exp(values) - exp(-values) ) / ( exp(values) + exp(-values) )
def create_hidden_layer(n, greyscales, weights, exp, normalize, weigh, activate):
normalized = normalize(greyscales)
weighted = weigh(normalized, weights)
activated = activate(weighted)
return activated
greyscales = cuda.to_device(greyscales)
weights = cuda.to_device(weights)
normalized = cuda.device_array(shape=(n,), dtype=np.float32)
weighted = cuda.device_array(shape=(n,), dtype=np.float32)
activated = cuda.device_array(shape=(n,), dtype=np.float32)
activated = activated.copy_to_host()
arguments = {"n":n,
"greyscales": greyscales,
"weights": weights,
"exp": exp,
"normalize": normalize,
"weigh": weigh,
"activate": activate}
a = create_hidden_layer(**arguments)
print(a)
The code seems to work fine after all the transformations, but there is one but... It's not fast enough. In the task, it is stated that the code should run in less than 1s, while my code runs in 1.23s...
Maybe someone knows how I could refactor my code more? Or maybe notices any silly mistakes I have made in my code? Would be very grateful for any help!
greyscales = cuda.to_device(greyscales)
weights = cuda.to_device(weights)
normalized = cuda.device_array(shape=(n,), dtype=np.float32)
weighted = cuda.device_array(shape=(n,), dtype=np.float32)
activated = cuda.device_array(shape=(n,), dtype=np.float32)
activated = activated.copy_to_host()
Move this section inside the "create_hidden_layer" function. I did that and it ran in ~0.5 secs.