artificial-intelligencepolicyreinforcement-learningmarkov-decision-process

determine MDP from seen transitions


The following transitions has been seen in a markov decision process. try to determine it

 R  A  S′ S

 0  U  C  B
-1  L  E  C
 0  D  C  A
-1  R  E  C
 0  D  C  A
+1  R  D  C
 0  U  C  B
+1  R  D  C

I need to find the states, transitions, rewards and probability of transitions. I've solved all but the probabilities and I don't know how to compute them If anyone can help, I just need to know where to start


Solution

  • For state B, action U always results in new state C. So, P(C|B,U)=1 (you might also argue that P(C|B)=1). P(D|C,R)=2/3 since in two out of three cases action R in state C has resulted in D.