reinforcement-learningmdp

Reinforcement Learning without Successor State


I'm attempting to pose a problem as a reinforcement learning problem. My difficulty is that the state which an agent is in changes randomly. They must simply choose an action within the state they are in. I want to learn appropriate actions for all states based on the reward they receive for performing actions.

Question:

Is this a specific type of RL problem? If there is no successor state, so how would one calculate the value of a state?


Solution

  • If the state really changes randomly, if there is no relationship between the action and the following state, then all you can do is record and average the rewards for each action and each state.