reinforcement-learningmarkov-chainsmarkov-models

What are the states and rewards in the reward matrix?


This code :

R = ql.matrix([ [0,0,0,0,1,0],
        [0,0,0,1,0,1],
        [0,0,100,1,0,0],
        [0,1,1,0,1,0],
        [1,0,0,1,0,0],
        [0,1,0,0,0,0] ])

is from :

https://github.com/PacktPublishing/Artificial-Intelligence-By-Example/blob/47bed1a88db2c9577c492f950069f58353375cfe/Chapter01/MDP.py

R is defined as the "Reward matrix for each state" . What are the states and rewards in this matrix ?

# Reward for state 0
print('R[0,]:' , R[0,])

# Reward for state 0
print('R[1,]:' , R[1,])

prints :

R[0,]: [[0 0 0 0 1 0]]
R[1,]: [[0 0 0 1 0 1]]

Is [0 0 0 0 1 0] state0 & [0 0 0 1 0 1] state1 ?


Solution

  • According to the book that uses that example, R represents the reward of the transitions from one current state s to another next state s'.

    Specifically, R is associated with the following graph:

    enter image description here

    Each line in the matrix R represents a letter from A to F, and each column represents a letter from A to F. The 1 values represent the nodes of the graphs. I.e., R[0,]: [[0 0 0 0 1 0]] means that you can go from state s=A to next state s'=E and receive a reward of 1. Similarly, R[1,]: [[0 0 0 1 0 1]] means that you receive a reward of 1 if you go from B to F or D. The goal seems to be achieving and remaining in C, which obtains the largest reward.