Sridhar Mahadevan, Sarah Osentoski, Jeff Johns, Kimberly Ferguson, Chang Wang
This paper describes a new harmonic analysis framework for planning based on estimating a diffusion model that models information flow on a graph (discrete state space) or a manifold (continuous state space) using the Laplace heat equation. Diffusion models can be significantly easier to learn than transition models, and yet provide much of the same speedups in performance over model-free methods. Several types of diffusion models are described, including undirected and directed state-based models, as well as state-action models. Two methods for constructing novel basis representations from diffusion models are described: Fourier methods diagonalize a symmetric diffusion operator called the Laplacian; Wavelet methods dilate unit basis functions progressively using powers of the diffusion operator. A new variant of policy iteration -- called representation policy iteration -- is described consisting of an outer loop that estimates new basis functions by diagonalization or dilation, and an inner loop that learns the best policy representable within the linear span of the current basis functions. Results from continuous and discrete MDPs are provided to illustrate the new approach.
Subjects: 12. Machine Learning and Discovery; 12.1 Reinforcement Learning
Submitted: Jun 26, 2007