Daishi Harada, University of California, Berkeley
Suppose we allow the controller to perform arbitrary search, and to base its control on the backed up information. To do this, we need to make decisions about the following: the order in which search nodes are expanded, and when to stop searching and actually "commit" to a control. The approach that we take is to view these decisions as the meta-level control problem. With some care in the formulation, it can be seen that a solution to this meta-level control problem will provide us with a bounded optimal controller. We would like to solve this problem by using algorithms from reinforcement learning.