We consider a novel use of mostly-correct reactive policies. In classical planning, reactive policy learning approaches could find good policies from solved trajectories of small problems and such policies have been successfully applied to larger problems of the target domains. Often, due to the inductive nature, the learned reactive policies are mostly correct but commit errors on some portion of the states. Discrepancy search has been developed to explore the structure of the heuristic function when it is mostly-correct. In this paper, to improve the performance of machine learned reactive policies, we propose to use such policies in discrepancy search. In our experiments on benchmark planning domains, our proposed approach is effective in improving the performance of the machine learned reactive policies. The proposed approach outperformed the policy rollout with the learned policies as well as the machine learned policies themselves. As an extension, we consider using reactive policies in heuristic search. During a node expansion in a heuristic search, we added to the search queue all the states that occur along the trajectory of the given policy from the node. Experiments show that this approach greatly improves the performance of heuristic search on benchmark planning domains.
Subjects: 1.11 Planning; 15.4 Reactive Control
Submitted: May 17, 2006