Kee-Eung Kim, Thomas L. Dean, and Nicolas Meuleau
In stochastic planning problems formulated as factored Markov decision processes (MDPs), also called dynamic belief network MDPs (DBN-MDPs) (Boutilier, Dean, and Hanks 1999), finding the best policy (or conditional plan) is NP-hard. One of the difficulties comes from the fact that the number of conditionals required to specify the policy can grow to be exponential in the size of the representation for the MDP. Several recent algorithms have focused on finding an approximate policy by restricting the representation of conditionals using decision trees. We propose an alternative policy representation for Factored MDPs in terms of finite-state machine (FSM) controllers. Since practically speaking we are forced to limit the number of conditionals, we claim that there is a benefit to be had in using FSM controllers given that these controllers can use their internal state to maintain context information that might otherwise require a large conditional table or decision tree. Although the optimal policy might not be representable as a finite-state controller with a fixed amount of memory, we will be satisfied with finding a "good" policy; to that end, we derive a stochastic greedy-search algorithm based on recent developments in reinforcement learning (Baird and Moore 1999) and then demonstrate its performance in some example domains.