dynamic-programmingmarkov-chainsreinforcement-learningmarkov-modelscontrol-theory

Continuous-time finite-horizon MDP


Is there any algorithm for solving a finite-horizon semi-Markov-Decision-Process?

I want to find the optimal policy for a sequential decision problem with a finite action space, a finite state space, and a deadline. Critically, different actions take different amounts of time and for one of the actions this duration is stochastic. I can model time as being discrete or continuous depending on which methods are available.

I am aware of algorithms for discounted infinite-horizon semi-MDPs, but I cannot find any work on finite-horizon semi-MDPs. Has this class of problems been studied before?


Solution

  • As with almost any MDP, backward dynamic programming should work. You could discretize your finite horizon in small steps from 0 to the deadline and then recursively update the values starting from the deadline. In the state space you'll have to track the current action, the total time spend on that action, and the already completed actions. The number of possible states may be quite large.

    In the dynamic program you can maybe exploit that you can select the value function for the state at the time the action is completed.