pythontensorflowneural-networkpytorchfeed-forward

How to create custom neural network with custom weight initialization in tensorflow or pytorch


I'm trying to create a small neural network with custom connections between neurons. The connections should exist over several layers and not be fully connected (sparse) as shown in the picture. I would also like to do the weight initialization manually and not completely randomly. My goal is to determine whether a connection is positive or negative. Is it possible to create such a neural net in tensorflow (python/js) or pytorch?

enter image description here


Solution

  • To summarize:
    Can you do it? -- Yes, absolutely.
    Is it going to be pretty? -- No, absolutely not.

    In my explanation, I will further focus on PyTorch, as this is the library that I am more comfortable with, and that is especially more useful if you have custom operations that you can easily express in a pythonic manner. Tensorflow also has eager execution mode (more serious integration from version 2, if I remember that correctly), but it is traditionally done with computational graphs, which make this whole thing a little uglier than it needs to be.

    As you hopefully know, backpropagation (the "learning" step in any ANN) is basically an inverse pass through the network, to calculate gradients, or at least close enough to the truth for our problem at hand. Importantly, torch functions store this "reverse" direction, which makes it trivial for the user to call backpropagation functions.

    To model a simple network as described in your image, we have only one major disadvantage:
    The available operations are usually excelling at what they are doing because they are simply and can be optimized quite heavily. In your case, you have to express different layers as custom operations, which generally scales incredibly poorly, unless you can express the functionals as some form of matrix operation, which I do not see straigt away in your example. I am further assuming that you are applying some form of non-linearity, as it would otherwise be a network that would fail for any non-linearly separable problem.

    import torch
    import torch.nn as nn
    class CustomNetwork(nn.module):
        def __init__(self):
            self.h_1_1 = nn.Sequential(nn.Linear(1,2), nn.ReLU) # top node in first layer
            self.h_1_2 = nn.Sequential(nn.Linear(1,2), nn.ReLU) # bottom node in first layer
            # Note that these nodes have no shared weights, which is why we
            # have to initialize separately.
            self.h_2_1 = nn.Sequential(nn.Linear(1,1), nn.ReLU) # top node in second layer
            self.h_2_2 = nn.Sequential(nn.Linear(1,1), nn.ReLU) # bottom node in second layer
    
            self.h_2_1 = nn.Sequential(nn.Linear(2,1), nn.ReLU) # top node in third layer
            self.h_2_2 = nn.Sequential(nn.Linear(2,1), nn.ReLU) # bottom node in third layer
           # out doesn't require activation function due to pairing with loss function
            self.out = nn.Linear(2,1)
    
        def forward(self, x):
            # x.shape: (batch_size, 2)
    
            # first layer. shape of (batch_size, 2), respectively
            out_top = self.h_1_1(x[:,0])
            out_bottom = self.h_1_2(x[:,1])
    
            # second layer. shape of (batch_size, 1), respectively
            out_top_2 = self.h_2_1(out_top[:,0])
            out_bottom_2 = self.h_2_2(out_bottom[:,0])
    
            # third layer. shape of (batch_size, 1), respectively
            # additional concatenation of previous outputs required.
            out_top_3 = self.h_3_1(torch.cat([out_top_2, -1 * out_top[:,1]], dim=1))
            out_bottom_3 = self.h_3_2(torch.cat([out_bottom_2, -1 * out_bottom[:,1]], dim=1))
            return self.out(torch.cat([out_top_3, out_bottom_3], dim=1))
    

    As you can see, any computational step is (in this case rather explicitly) given, and very much possible. Again, once you want to scale your number of neurons for each layer, you are going to have to be a little more creative in how you process, but for-loops do very much work in PyTorch as well. Note that this will in any case be much slower than a vanilla linear layer, though. If you can live with seperately trained weights, you can always also just define separate linear layers of smaller size and put them in a more convenient fashion.