Nicholas Armstrong-Crews, Geoffrey Gordon, Manuela Veloso
Partially Observable Markov Decision Processes, or POMDPs, are useful for representing a variety of decision problems; unfortunately, solving for an optimal policy is computational intractable in general. In this paper, we present a set of novel search techniques for solving POMDPs approximately. We build on previous heuristic search and point-based algorithms, but improve upon them in several ways: we introduce an efficient method for approximating the convex hull of the upper bound, we expose embedded finite Markov structure in each bound, and we show how to prune aggressively while still maintaining convergence. The net result is a more targeted growth of the bound representations, leading to lower overall runtime and storage. We synthesize these contributions into a novel algorithm, which we call AAA-POMDP (Appropriately Acronymmed Algorithm). We describe its theoretical properties, including computational efficiency, and examine its performance on on standard benchmark problems from the literature. On these problems, our algorithms exhibit competitive or superior performance when compared to previous methods.
Subjects: 12.1 Reinforcement Learning; 1.11 Planning
Submitted: May 5, 2008