Jonathan Tash, Stuart Russell
We present new algorithms for local planning over Markov decision processes. The base-level algorithm possesses several interesting features for control of computation, based on selecting computations according to their expected benefit to decision quality. The algorithms are shown to expand the agent’s knowledge where the world warrants it, with appropriate responsiveness to time pressure and randomness. We then develop an introspective algorithm, using an internal representation of what computational work has already been done. This strategy extends the agent’s knowledge base where warranted by the agent’s world model and the agent’s knowledge of the work already put into various parts of this model. It also enables the agent to act so as to take advantage of the computational savings inherent in staying in known parts of the state space. The control flexibility provided by this strategy, by incorporating natural problem-solving methods, directs computational effort towards where it’s needed better than previous approaches, providing grcatcr hopes for scalability to large domains.