William H. Turkett, Jr., Wake Forest University; and John R. Rose, University of South Carolina
While Partially-Observable Markov Decision Processes have become a popular means of representing realistic planning problems, exact approaches to finding POMDP policies are extremely computationally complex. An alternative approach for control in POMDP domains is to use run-time optimization over action sequences in a dynamic decision network. While exact algorithms have to generate a policy over the entire belief space, a run-time approach only needs to reason about the reachable parts of the belief space. By combining run-time planning with caching, it is possible to exploit locality of belief states in POMDPS, allowing for significant speedups in plan generation. This paper introduces and evaluates an exact caching mechanism and a grid-based caching mechanism.