TLTR; Does NEAT algorithm allow it's input/output layers to also evolve activation functions, or only uses identity?
I'm working on a custom NeuralNet, largely inspired by the implementation of NEAT (NeuroEvolution of Augmenting Topologies).
As far as my experience and knowledge goes, input neurons in most networks activate without affecting the values they hold - they just pass it (identity function) And the output layer neurons can have activation functions that are preset based on the problem that the network is trying to solve, usually it's identity, softmax or sigmoid.
For the NEAT algorithm do the inputs/outputs evolve their functions, or are they unchangeable?
Yes NEAT allows to "evolve" activation functions. Which means nodes get inserted with a random activation function (you can choose what kind of activation functions to use there)
However they don't "evolve" as in the activations function changes continously. However different nodes can have different activation function and they can mutate (existing nodes changing activation function).
https://neat-python.readthedocs.io/en/latest/config_file.html#defaultgenome-section
activation default The default activation function attribute assigned to new nodes. If none is given, or “random” is specified, one of the activation_options will be chosen at random. activation_mutate_rate The probability that mutation will replace the node’s activation function with a randomly-determined member of the activation_options. Valid values are in [0.0, 1.0]. activation_options A space-separated list of the activation functions that may be used by nodes. This defaults to sigmoid. The built-in available functions can be found in Overview of builtin activation functions; more can be added as described in Customizing Behavior
Regarding your other statement:
As far as my experience and knowledge goes, input neurons in most networks activate without affecting the values they hold - they just pass it (identity function) And the output layer neurons can have activation functions that are preset based on the problem that the network is trying to solve, usually it's identity, softmax or sigmoid.
Actually it's the norm to have activation functions other than identity. There is a lot of theory regarding this topic.
The gist of it is deeper networks can be more efficient than shallow networks. If you only use the "identity function" as an activation function, you can rewrite a neural network of arbitrary depth to a shallow network therefore you have virtually no benefit of using a deep network (with activation function=identity function) vs a shallow network.