pythongpunumbaacceleration

GPU-accelerate neural network calculations


I'm trying to accomplish Nvidia's "Fundamentals of Accelerated Computing with CUDA Python" course and have got a task to refactor a simple version of some code that performs work needed to create a hidden layer in a neural network:

import numpy as np
from numba import cuda, vectorize

n = 1000000

greyscales = np.floor(np.random.uniform(0, 255, n).astype(np.float32))
weights = np.random.normal(.5, .1, n).astype(np.float32)

from numpy import exp

def normalize(grayscales):
    return grayscales / 255

def weigh(values, weights):
    return values * weights
    
def activate(values):
    return ( exp(values) - exp(-values) ) / ( exp(values) + exp(-values) )

def create_hidden_layer(n, greyscales, weights, exp, normalize, weigh, activate):
    normalized = normalize(greyscales)
    weighted = weigh(normalized, weights)
    activated = activate(weighted)
    return activated

arguments = {"n":n,
            "greyscales": greyscales,
            "weights": weights,
            "exp": exp,
            "normalize": normalize,
            "weigh": weigh,
            "activate": activate}

a = create_hidden_layer(**arguments)
print(a)

I have transformed the code a little bit and after modifications, it looks like this:

from math import exp

@vectorize(['float32(float32)'],target='cuda')
def normalize(grayscales):
    return grayscales / 255

@vectorize(['float32(float32,float32)'],target='cuda')
def weigh(values, weights):
    return values * weights

@vectorize(['float32(float32)'],target='cuda')
def activate(values):
    return ( exp(values) - exp(-values) ) / ( exp(values) + exp(-values) )

def create_hidden_layer(n, greyscales, weights, exp, normalize, weigh, activate):
    normalized = normalize(greyscales)
    weighted = weigh(normalized, weights)
    activated = activate(weighted)
    return activated

greyscales = cuda.to_device(greyscales)
weights = cuda.to_device(weights)

normalized = cuda.device_array(shape=(n,), dtype=np.float32)
weighted = cuda.device_array(shape=(n,), dtype=np.float32)
activated = cuda.device_array(shape=(n,), dtype=np.float32)

activated = activated.copy_to_host()

arguments = {"n":n,
            "greyscales": greyscales,
            "weights": weights,
            "exp": exp,
            "normalize": normalize,
            "weigh": weigh,
            "activate": activate}

a = create_hidden_layer(**arguments)
print(a)

The code seems to work fine after all the transformations, but there is one but... It's not fast enough. In the task, it is stated that the code should run in less than 1s, while my code runs in 1.23s...

Maybe someone knows how I could refactor my code more? Or maybe notices any silly mistakes I have made in my code? Would be very grateful for any help!


Solution

  • greyscales = cuda.to_device(greyscales)
    weights = cuda.to_device(weights)
    
    normalized = cuda.device_array(shape=(n,), dtype=np.float32)
    weighted = cuda.device_array(shape=(n,), dtype=np.float32)
    activated = cuda.device_array(shape=(n,), dtype=np.float32)
    
    activated = activated.copy_to_host()
    

    Move this section inside the "create_hidden_layer" function. I did that and it ran in ~0.5 secs.