
What are the states and rewards in the reward matrix?

This code :

R = ql.matrix([ [0,0,0,0,1,0],
        [0,1,0,0,0,0] ])

is from :

# Reward for state 0
print('R[0,]:' , R[0,])

print('R[1,]:' , R[1,])

prints :

R[0,]: [[0 0 0 0 1 0]]
R[1,]: [[0 0 0 1 0 1]]

Is [0 0 0 0 1 0] state0 & [0 0 0 1 0 1] state1 ?


  • According to the book that uses that example, R represents the reward of the transitions from one current state s to another next state s'.

    Specifically, R is associated with the following graph:

    Each line in the matrix R represents a letter from A to F, and each column represents a letter from A to F. The 1 values represent the nodes of the graphs. I.e., R[0,]: [[0 0 0 0 1 0]] means that you can go from state s=A to next state s'=E and receive a reward of 1. Similarly, R[1,]: [[0 0 0 1 0 1]] means that you receive a reward of 1 if you go from B to F or D. The goal seems to be achieving and remaining in C, which obtains the largest reward.