Jia-Hong Wu, Rajesh Kalyanam, Robert Givan
Enforced hill-climbing is an effective deterministic hill-climbing technique that deals with local optima using breadth-first search (a process called ``basin flooding''). We propose and evaluate a stochastic generalization of enforced hill-climbing for online use in goal-oriented probabilistic planning problems. We assume a provided heuristic function estimating expected cost to the goal with flaws such as local optima and plateaus that thwart straightforward greedy action choice. While breadth-first search is effective in exploring basins around local optima in deterministic problems, for stochastic problems we dynamically build and solve a local Markov-decision process model of the basin in order to find a good escape policy exiting the local optimum. We evaluate our proposal in a wide range of recent probabilistic planning-competition benchmark domains. For evaluation, we show that stochastic enforced hill-climbing produces better policies than greedy action choice for value functions derived in two very different ways. First, we propose a novel heuristic function derived from the ideas in the effective re-planner FF-Replan. This new ``controlled-randomness FF heuristic'' is the deterministic FF heuristic computed on the simple determinization of the probabilistic problem that makes available a deterministic transition wherever a probabilistic transition was possible. Our results show that stochastic enforced hill-climbing with this heuristic significantly outperforms simply greedily following the heuristic, and also substantially outperforms FF-Replan. We additionally evaluate our technique on automatically learned value functions that on their own perform at the state-of-the-art when used to construct a greedy policy, and again show significant improvement over greedy action selection.
Subjects: 1.11 Planning; 15.7 Search
Submitted: Jun 27, 2008