Nicholas K. Jong and Peter Stone
Hierarchical methods have attracted much recent attention as a means for scaling reinforcement learning algorithms to increasingly complex,real-world tasks. These methods provide two important kinds of abstraction that facilitate learning. First, hierarchies organize actions into temporally abstract high-level tasks. Second, they facilitate task dependent state abstractions that allow each high-level task to restrict attention only to relevant state variables. In most approaches to date, the user must supply suitable task decompositions and state abstractions to the learner. How to discover these hierarchies automatically remains a challenging open problem. As a first step towards solving this problem, we introduce a general method for determining the validity of potential state abstractions that might form the basis of reusable tasks. We build a probabilistic model of the underlying Markov decision problem and then statistically test the applicability of the state abstraction. We demonstrate the ability of our procedure to discriminate among safe and unsafe state abstractions in the familiar Taxi domain.