Duncan Potts and Bernhard Hengst
Task hierarchies can be used to decompose an intractable problem into smaller more manageable tasks. This paper explores how task hierarchies can model a domain for control purposes, and examines an existing algorithm (HEXQ) that automatically discovers a task hierarchy through interaction with the environment. The initial performance of the algorithm can be limited because it must adequately explore each level of the hierarchy before starting construction of the next, and it cannot adapt to a dynamic environment. The contribution of this paper is to present an algorithm that avoids any protracted period of initial exploration by discovering multiple levels of the hierarchy simultaneously. This can significantly improve initial performance as the agent takes advantage of all hierarchical levels early on in its development. Robustness is also improved because undiscovered features and environment changes can be incorporated later into the hierarchy. Empirical results show the new algorithm to significantly outperform HEXQ.