Juan C. Santamaría and Ashwin Ram
Autonomous agents engaged in a continuous interaction with an incompletely known environment face the problem of dual control. Simply stated, actions are necessary not only for studying the environment, but also for making progress on the task. In other words, actions must bear a "dual" character: They must be investigators to some degree, but also directors to some degree. Because the number of variables involved in the solution of the dual control problem increases with the number of decision stages, the exact solution of the dual control problem is computationally intractable except for a few special cases. This paper provides an overview of dual control theory and proposes a heuristic approach towards obtaining a near-optimal dual control method that can be implemented. The proposed algorithm selects control actions taking into account the information contained in past observations as well as the possible information that future observations may reveal. In short, the algorithm anticipates the fact that future learning is possible and selects the control actions accordingly. The algorithm uses memory-based methods to associate long-term benefit estimates to belief states and actions, and selects the actions to execute next according to such estimates. The algorithm uses the outcome of every experience to progressively refine the long-term benefit estimates so that it can make better, improved decisions as it progresses. The algorithm is tested on a classical simulation problem.