Learning State Features from Policies to Bias Exploration in Reinforcement Learning

Bryan Singer and Manuela Veloso, Carnegie Mellon University

When given several problems to solve in some domain, a standard reinforcement learner learns an optimal policy from scratch for each problem. This seems rather unfortunate in that one might expect some domain-specific information to be present in the solution to one problem for solving the next problem. Using this information would improve the reinforcement learner’s performance. However, policies learned by standard reinforcement learning techniques are often very dependent on the exact states, rewards, and state transitions in the particular problem. Therefore, it is infeasible to directly apply a learned policy to new problems, and so several approaches have been and are being investigated to find structure, abstraction, generalization, and/or policy reuse in reinforcement learning. Within our line of research, we describe each state in terms of local features, assuming that these state features together with the learned policies can be used to abstract out the domain characteristics from the specific layout of states and rewards of a particular problem. When given a new problem to solve, this abstraction is used as an exploration bias to improve the rate of convergence of a reinforcement learner.

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.