Jeff Johns, Sridhar Mahadevan, Chang Wang
A new spectral approach to value function approximation has recently been proposed to automatically construct basis functions from samples. Global basis functions called proto-value functions are generated by diagonalizing a diffusion operator, such as a reversible random walk or the Laplacian, on a graph formed from connecting nearby samples. This paper addresses the challenge of scaling this approach to large domains. We propose using Kronecker factorization coupled with the Metropolis-Hastings algorithm to decompose reversible transition matrices. The result is that the basis functions can be computed on much smaller matrices and combined to form the overall bases. We demonstrate that in several continuous Markov decision processes, compact basis functions can be constructed without significant loss in performance. In one domain, basis functions were compressed by a factor of 36. A theoretical analysis relates the quality of the approximation to the spectral gap. Our approach generalizes to other basis constructions as well.
Subjects: 12.1 Reinforcement Learning; 12. Machine Learning and Discovery
Submitted: Apr 24, 2007