artificial-intelligenceprobabilityreinforcement-learningexpert-systemmarkov-decision-process

What do we mean by "controllable actions" in a POMDP?


I have some questions related to POMDPs.

  1. What do we mean by controllable actions in a partially observable Markov decision process? Or no controllable actions in hidden Markov states?

  2. When computing policies through value or policy iteration, could we say that the POMDP is an expert system (because we model the environment)? While, when using Q-learning, it is a more flexible system in terms of intelligence or adaptability to a changing environment?


Solution

  • Actions

    Controllable actions are the results of choices that the decision maker makes. In the classic POMDP tiger problem, there is a tiger hidden behind one of two doors. At each time step, the decision maker can choose to listen or to open one of the doors. The actions in this scenario are {listen, open left door, open right door}. The transition function from one state to another depends on both the previous state and the action chosen.

    In a hidden Markov model (HMM), there are no actions for the decision maker. In the tiger problem context, this means the participant can only listen without opening doors. In this case, the transition function only depends on the previous state, since there are no actions.

    For more details on the tiger problem, see Kaelbling Littman and Cassandra's 1998 POMDP paper, Section 5.1. There's also a more introductory walk-through available in this tutorial.

    Adaptability

    The basic intuition in your question is correct, but can be refined. POMDPs are a class of models, whereas Q-learning is a solution technique. The basic difference in your question is between model-based and model-free approaches. POMDPs are model-based, although the partial observability allows for additional uncertainty. Reinforcement learning can be applied in a model-free context, with Q-learning. The model-free approach will be more flexible for non-stationary problems. That being said, depending on the complexity of the problem, you could incorporate the non-stationarity into the model itself and treat it as an MDP.

    There's a very thorough discussion on these non-stationary modelling trade-offs in the answer to this question.

    Lastly, it is correct that POMDP's can be considered expert systems. Mazumdar et al (2017) have suggested treating Markov decision processes (MDPs) as expert systems.