The following transitions has been seen in a markov decision process. try to determine it
R A S′ S
0 U C B
-1 L E C
0 D C A
-1 R E C
0 D C A
+1 R D C
0 U C B
+1 R D C
I need to find the states, transitions, rewards and probability of transitions. I've solved all but the probabilities and I don't know how to compute them If anyone can help, I just need to know where to start
For state B
, action U
always results in new state C
. So, P(C|B,U)=1
(you might also argue that P(C|B)=1
). P(D|C,R)=2/3
since in two out of three cases action R
in state C
has resulted in D
.