Sridhar Mahadevan, Mauro Maggioni, Kimberly Ferguson, Sarah Osentoski
This paper presents a novel framework for simultaneously learning representation and control in continuous Markov decision processes. Our approach builds on the framework of proto-value functions, in which the underlying representation or basis functions are automatically derived from a spectral analysis of the state space manifold. The proto-value functions correspond to the eigenfunctions of the graph Laplacian. We describe an approach to extend the eigenfunctions to novel states using the Nystrom extension. A least-squares policy iteration method is used to learn the control policy, where the underlying subspace for approximating the value function is spanned by the learned proto-value functions. A detailed set of experiments is presented using classic benchmark tasks, including the inverted pendulum and the mountain car, measuring the sensitivity to various parameters, and including comparisons with a handcoded parametric radial basis function approximator.
Subjects: 12.1 Reinforcement Learning; 12. Machine Learning and Discovery