reinforcement-learningstochastic-processmarkov-decision-process

Following action a from state s, is the outcome probablisitc or deterministic?


I am struggling to understand one aspect of the Markov Decison Process.

When I am in state s and do action a, is it deterministic or stochastic to arrive in state s+1?

In most examples it seems to be deterministic. However I found one example in the picture below (David Silvers lecture on RL) where the transistion is stochastic. Namely following action "Pub".

graph


Solution

  • In general, in Markov Decission Processes the transition between states can be stochastic. Usually the probability trasition to another state is denoted with P_a(s, s'), where s is the current state, s' the next state, and a the action performed.

    The deterministic case is a particular case of the stochastic one. If P_a(s, s') is equal to 1 for a given s' and 0 for the remaining states, we have a deterministic transition.