I have 3 different actions (A & B & Nothing) each with different powers (e.g A100 A50 B100 B50) I wonder what is the best way to feed these actions to a NN in order to yield best results?
1- feed A/B to input 1, while action power 100/50/Nothing to input 2
2- feed A100/A50/Nothing to input 1, while B100/B50/Nothing to input 2
3- feed A100/A50 to input 1, while B100/B50 to input 2, while Nothing flag to input 3
4- Also to feed 100 & 50 or normalize them to 2 & 1 ?
I need reasons why to choose one method.
What do you want to learn? What should be the output? Is the input just the used action? If you are learning a model of the environment, it is expressed by a probability distribution:
P(next_state|state, action)
It is common to use a separate model for each action. That makes the mapping between input and output simpler. The input is a vector of state features. The output is a vector of the features of the next state. The used action is implied by the model.
The state features could be encoded as bits. An active bit would indicate the presence of a feature.
This would learn a deterministic model. I don't know what is a good way to learn a stochastic model of the next states. One possibility may be to use stochastic neurons.