Terran Lane, University of New Mexico; and Andrew Wilson, Sandia National Laboratories
We examine application of relational learning methods to reinforcement learning in spatial navigation tasks. Specifically, we consider a goal-seeking agent with noisy control actions embedded in an environment with strong topological structure. While formally a Markov decision process (MDP), this task possesses special structure derived from the underlying topology that can be exploited to speed learning. We describe relational policies for such environments that are relocatable by virtue of being parameterized solely in terms of the relations (distance and direction) between the agent’s current state and the goal state. We demonstrate that this formulation yields significant learning improvements in completely homogeneous environments for which exact policy relocation is possible. We also examine the effects of non-homogeneities such as walls or obstacles and show that their effects can be neglected if they fall outside of a closed-form envelope surrounding the optimal path between the agent and the goal. To our knowledge, this is the first closed-form result for the structure of an envelope in an MDP. We demonstrate that relational reinforcement learning in an environment that obeys the envelope constraints also yields substantial learning performance improvements.