This code :
R = ql.matrix([ [0,0,0,0,1,0],
[0,0,0,1,0,1],
[0,0,100,1,0,0],
[0,1,1,0,1,0],
[1,0,0,1,0,0],
[0,1,0,0,0,0] ])
is from :
R is defined as the "Reward matrix for each state" . What are the states and rewards in this matrix ?
# Reward for state 0
print('R[0,]:' , R[0,])
# Reward for state 0
print('R[1,]:' , R[1,])
prints :
R[0,]: [[0 0 0 0 1 0]]
R[1,]: [[0 0 0 1 0 1]]
Is [0 0 0 0 1 0]
state0 & [0 0 0 1 0 1]
state1 ?
According to the book that uses that example, R
represents the reward of the transitions from one current state s
to another next state s'
.
Specifically, R
is associated with the following graph:
Each line in the matrix R
represents a letter from A to F, and each column represents a letter from A to F. The 1
values represent the nodes of the graphs. I.e., R[0,]: [[0 0 0 0 1 0]]
means that you can go from state s=A
to next state s'=E
and receive a reward of 1. Similarly, R[1,]: [[0 0 0 1 0 1]]
means that you receive a reward of 1 if you go from B
to F
or D
. The goal seems to be achieving and remaining in C
, which obtains the largest reward.