Constant-Time Markov Tracking for Sequential POMDPs

Richard Washington

Stochastic representations of a problem domain can capture aspects that are otherwise difficult to model, such as errors, alternative outcomes of actions, and uncertainty about the world. Markov models (Bellman 1957) are a popular choice for constructing stochastic representations. The classic MDP, however, does not account for uncertainty in the process state. Often this state is known only indirectly through sensors or tests. To account for the state uncertainty, MDPs have been extended to partially observable MDPs (POMDPs) (Lovejoy 1991; Monahan 1982; Washington 1996). this model, the underlying process is an MDP, but rather than supplying exact state information, the process produces one of a set of observations. The major drawback of POMDPs is that finding an optimal plan is computationally intractable for realistic problems. We are inter~ted in seeing what limitations may be imposed that allow on-line computation. We have developed the approach of Markov Tracking (Washington 1997), which chooses actions that coordinate with a process rather than influence its behavior. It uses the POMDP model to follow the agent’s state, and reacts optimally to it. In this paper we discuss the application of Markov Tracking to a subclass of problems called sequential POAfDPs. Within this class the optimal action is not only computed locally, but in constant time, allowing true on-line performance with large-scale problems.

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.