Sandeep Goel and Manfred Huber
Reinforcement learning addresses the problem of learning to select actions in order to maximize an agent’s performance in unknown environments. To scale reinforcement learning to complex real-world tasks, agent must be able to discover hierarchical structures within their learning and control systems. This paper presents a method by which a reinforcement learning agent can discover subgoals with certain structural properties. By discovering subgoals and including policies to subgoals as actions in its action set, the agent is able to explore more effectively and accelerate learning in other tasks in the same or similar environments where the same subgoals are useful. The agent discovers the subgoals by searching a learned policy model for state that exhibits certain structural properties. This approach is illustrated using gridworld tasks.