Erik Talvitie, Satinder Singh
A long-lived agent continually faces new tasks in its environment. Such an agent may be able to use knowledge learned in solving earlier tasks to produce candidate policies for its current task. There may, however, be multiple reasonable policies suggested by prior experience, and the agent must choose between them potentially without any a priori knowledge about their applicability to its current situation. We present an "experts" algorithm for efficiently choosing amongst candidate policies in solving an unknown Markov decision process task. We conclude with the results of experiments on two domains in which we generate candidate policies from solutions to related tasks and use our experts algorithm to choose amongst them.
Subjects: 12.1 Reinforcement Learningn
Submitted: Oct 11, 2006