Learning to Plan using Harmonic Analysis of Diffusion Models

Sridhar Mahadevan, Sarah Osentoski, Jeff Johns, Kimberly Ferguson, Chang Wang

This paper describes a new harmonic analysis framework for planning based on estimating a diffusion model that models information flow on a graph (discrete state space) or a manifold (continuous state space) using the Laplace heat equation. Diffusion models can be significantly easier to learn than transition models, and yet provide much of the same speedups in performance over model-free methods. Several types of diffusion models are described, including undirected and directed state-based models, as well as state-action models. Two methods for constructing novel basis representations from diffusion models are described: Fourier methods diagonalize a symmetric diffusion operator called the Laplacian; Wavelet methods dilate unit basis functions progressively using powers of the diffusion operator. A new variant of policy iteration -- called representation policy iteration -- is described consisting of an outer loop that estimates new basis functions by diagonalization or dilation, and an inner loop that learns the best policy representable within the linear span of the current basis functions. Results from continuous and discrete MDPs are provided to illustrate the new approach.

Subjects: 12. Machine Learning and Discovery; 12.1 Reinforcement Learning

Submitted: Jun 26, 2007

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.