Richard Maclin, Jude Shavlik, Lisa Torrey, Trevor Walker, Edward Wild
We present a novel formulation for providing advice to a reinforcement learner that employs support-vector regression as its function approximator. Our new method extends a recent advice-giving technique, called Knowledge-Based Kernel Regression (KBKR), that accepts advice concerning a single action of a reinforcement learner. In KBKR, users can say that in some set of states, an action’s value should be greater than some linear expression of the current state. In our new technique, which we call Preference KBKR (Pref-KBKR), the user can provide advice in a more natural manner by recommending that some action is preferred over another in the specified set of states. Specifying preferences essentially means that users are giving advice about policies rather than Q values, which is a more natural way for humans to present advice. We present the motivation for preference advice and a proof of the correctness of our extension to KBKR. In addition, we show empirical results that our method can make effective use of advice on a novel reinforcement-learning task, based on the RoboCup simulator, which we call Breakaway. Our work demonstrates the significant potential of advice-giving techniques for addressing complex reinforcement learning problems, while further demonstrating the use of support-vector regression for reinforcement learning.
Content Area: 12. Machine Learning
Subjects: 12. Machine Learning and Discovery; 12.1 Reinforcement Learning
Submitted: May 9, 2005