algorithmreinforcement-learningq-learning

confusion in selecting reward in q-learning


I am new to the field of Q-learning (QL) and I am trying to implement a small task using QL in MATLAB. The task is : Say there is one transmitter, one receiver and between them there are 10 relays. The main part is that I want to choose one of the relay using QL that will carry the signal from transmitter to receiver successfully.

So, as per QL theory, we need to define state, action, reward. Hence I had chosen them as: State : [P1,...,P10] where P1 is power from 1st relay to receiver. Like wise P10 is power from 10th relay to receiver.

action : [1,...,10] where action is nothing but choosing that relay which has highest power at that time.

My query is I am not getting how should I choose reward in this case ?

Any help in this regard will be highly appreciated.


Solution

  • There is only one state (i.e., this is actually a multi-armed bandit problem).

    There are ten actions, one per relay.

    The reward of each action is the power of the corresponding relay.