Jeff Johns, Sarah Osentoski, Sridhar Mahadevan
This paper summarizes ongoing research on a framework for representation learning using harmonic analysis, a subfield of mathematics. Harmonic analysis includes Fourier analysis, where new eigenvector representations are constructed by diagonalization of operators, and wavelet analysis, where new representations are constructed by dilation. The approach is presented specifically in the context of Markov decision processes (MDPs), a widely studied model of planning under uncertainty, although the approach is applicable more broadly to other areas of AI as well. This paper describes a novel harmonic analysis framework for planning based on estimating a diffusion model that models flow of information on a graph (discrete state space) or a manifold (continuous state space) using a discrete form of the Laplace heat equation. Two methods for constructing novel plan representations from diffusion models are described: Fourier methods diagonalize a symmetric diffusion operator called the Laplacian; wavelet methods dilate unit basis functions progressively using powers of the diffusion operator. A new planning framework called Representation Policy Iteration (RPI) is described consisting of an outer loop that estimates new basis functions by diagonalization or dilation, and an inner loop that learns the best policy representable within the linear span of the current basis functions. We demonstrate the flexibility of the approach, which allows basis functions to be adapted to a particular task or reward function, and the hierarchical temporally extended nature of actions.
Subjects: 12.1 Reinforcement Learning; 12. Machine Learning and Discovery
Submitted: Sep 14, 2007