Lihong Li, Vadim Bulitko, and Russell Greiner
We investigate the problem of using function approximation in reinforcement learning (RL) where the agent’s control policy is represented as a classifier mapping states to actions. The innovation of this paper lies with introducing a measure of state’s decision-making importance. We then use an efficient approximation to this measure as misclassification costs in learning the agent’s policy. As a result, the focused learning process is shown to converge faster to better policies.