[SOLVED] Effect of different epsilon value for Q-learning and SARSA

Effect of different epsilon value for Q-learning and SARSA

Since i am a beginning in this field,I am having a doubt about the effect in between how does the different epsilon value will affect the SARSA and Qlearning with the epsilon greedy algorithm for action selection.

I understand that when epsilon is equal to 0, actions are always choosed based on a policy derived from Q. Therefore, Q-learning first updates Q, and it selects the next action based on the updated Q. On the other hand, SARSA chooses the next action and after updates Q.

How about when ε is equal to 1? and ε is increase from 0 to 1?

Thank you!

Solution

The ε-greedy policy selects a random action with probability ε or the best known action with probability 1-ε. At ε=1 it will always pick the random action. This value makes the trade-off between exploration and exploitation: you want to use the knowledge you have, but you also want to search for better alternatives.