pythonpytorchconv-neural-network

How to format input data for Pytorch?


I have written a conv. neural network from scratch before, but I've decided to use Pytorch for its speed. However, I could not find documentation as to how to format for the conv2d layer. In general, there seems to be a lot of overheads and wrappers which prevents me from viewing what exactly is happening and writing my code accordingly.

I have trained a model on the MNIST dataset, and imported the model in order to run it (as per the tutorial):

class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 8, 3, stride = 1, padding = 1)
        self.pool = nn.MaxPool2d(2, stride = 2)
        self.conv2 = nn.Conv2d(8, 8, 3, stride = 1, padding = 1)
        self.linear1 = nn.Linear(7 * 7 * 8, 128)
        self.linear2 = nn.Linear(128, 128)
        self.linear3 = nn.Linear(128, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = torch.flatten(x, 1)
        x = F.relu(self.linear1(x))
        x = F.relu(self.linear2(x))
        x = self.linear3(x)
        return x

my_model = NeuralNetwork()
my_model.load_state_dict(torch.load("model_weights.pth", weights_only=True))
my_model.eval()

Now, I have a web application where:

  1. The user draws on a 28x28 canvas in black and white.
  2. The drawing is put into a flattened array of size 784, consisting of 0's (white on canvas) and 1's (black on canvas). (e.g. [0, 0, 1, 1, 1, 1, 0, 0, ..., 1, 1])

I have a sample code of what I wish to perform:

formatted_array = some_formatting_function(flattened_array_of_0_and_1)
x = torch.tensor(formatted_array)
pred = my_model(x)
guessed_digit = some_reading_function(pred)
print(guessed_digit)

# eventually return the guessed_digit

What should my some_formatting_function and some_reading_function be?


Solution

  • Formatting input data

    The input of the model should be the same shape as the input of the first layer, which is a Conv2D in your case. According to PyTorch's documentation on Conv2D, the input of such a layer must of the shape (N,C_in​,H_in​,W_in​) or (C_in,H_in,W_in), where N is the batch size, C_in is the number of channels (1 in your case), H_in is the image height (28) and W_in is the image width (28). Since you only evaluate inputs one by one, you can use the second form (or N=1).

    This means you should pass a tensor of shape (1,28,28) to your model. To obtain it, you could do something like :

    formatted_array = torch.tensor(flattened_array_of_0_and_1).view(1,28,28), optionally followed by a .transpose(1, 2) to swap the two spatial dimensions if they are inverted in the resulting tensor.

    You may also consider not flattening the data between the user drawing and the neural network inference, but you should probably still use .view(...) to add the "channels" dimension to your input tensor.

    Reading a prediction

    Classifier neural networks use the one-hot encoding, meaning they are trained to output (for each training sample) a target vector of all zeros, except for a one in the dimensions corresponding to the category of the training sample. During training, we are trying to get as close as possible to such a representation, and during inference we pick the dimension with the highest value from the output vector, and use this as the predicted label. You can do this using argmax() : guessed_digit = pred.argmax()