I want to learn a bit more about other approaches to training neural networks and I can find a fair bit of literature on GA training a network but not much on PSO training. How does that work?
I have a general idea: you create a swarm of so many particles and use the network loss function (eg MSE) as a heuristic. Particles will move to areas where the MSE is lowest and then you have your weights for the network.
I understand for an online vanilla back-propagation network, here's the general idea for training:
for each epoch:
for each training example d:
feed-forward d through layers 0..n
find error e as a function of expected vs. actual output
back-propagate e through layers n..0
update weights w as a function of w, e, learning and momentum rates
endfor
endfor
I just can't find much info on using PSO to train neural networks or where it fits into the algorithm. Beyond my threadbare (and perhaps incorrect) assumption, I don't know if it's meant for online or batch learning, how the error is found for inner layers without BP, whether PSO replaces or accompanies BP, etc.
I'd love a push in the right direction but not necessarily code as I'm more interested in learning about it first before implementation.
Just for posterity and in case someone else comes across this question: PSO is integrated into neural networks by replacing BP for training. Using the MSE error function along with a set of training examples, you have a continuous and bounded search space and a fitness function, exactly what PSO needs.
initialize a set of random particles in n-dimensions (n = # of weights in network)
perform PSO using swarm of particles
PSO fitness function is network MSE function
MSE function should (always?) uses feed forward to generate sum of errors of found vs target
over time, particles (as an encoding of weights) will find a minimum of MSE
return the best particle after so many iterations, initialize network weights as position
There are other applications you can use PSO in conjunction with neural networks such as hyperparameter selection or model structure selection. I was most interested in training, however.