Nicolas Meuleau, Ronen Brafman, Emmanuel Benazera
In over-subscription planning (OSP), the set of goals is not achievable jointly, and the task is to find a plan that attains the best feasible subset of goals given resource constraints. Recent classical OSP algorithms ignore the uncertainty inherent in many natural application domains where OSPs arise. And while modeling stochastic OSP problems as MDPs is easy, the resulting models are too large for standard solution approaches. Fortunately OSP problems have a natural two-tiered hierarchy, and in this paper we adapt and extend tools developed in the hierarchical reinforcement learning community in order to effectively exploit this hierarchy and obtain compact, factored policies. Typically, such policies are sub-optimal, but under certain assumptions that hold in our planetary exploration domain, our factored solution is, in fact, optimal. Our algorithms work by repeatedly solving a number of smaller MDPs, while propagating information between them. We evaluate a number of variants of this approach on a set of stochastic instances of a planetary rover domain, showing substantial performance gains.