reinforcement-learningmarkovmarkov-decision-process

What is terminal state in gridworld?


I am learning markov decision process. Am I don't know where to mark terminal states.

In 4x3 grid world, I marked the terminal state that I think correct(I might be wrong) with T. Pic

I saw an instruction mark terminal states as follow.

terminals=[(3, 2), (3, 1)]

Can someone explain how does it work?


Solution

  • In the given grid-world, you start at "start" which is (0,0). Then you move around from that point. If you reach at "end +1"{(3,2)} then the reward is +1 and the game ends. Likewise, if you reach at "end -1"{(3,1)} then the reward is -1 and the game ends. However, while you are moving around, you can't move to {(1,1)} as its invalid state. Also, if you reach any of the terminal state "T" which are at {(2,0) and(2,1)} then the game ends with zero reward.