AAAI Publications, Twenty-Ninth AAAI Conference on Artificial Intelligence

Font Size: 
Better Be Lucky than Good: Exceeding Expectations in MDP Evaluation
Thomas Keller, Florian Geißer

Last modified: 2015-03-04


We introduce the MDP-Evaluation Stopping Problem, the optimization problem faced by participants of the International Probabilistic Planning Competition 2014 that focus on their own performance. It can be constructed as a meta-MDP where actions correspond to the application of a policy on a base-MDP, which is intractable in practice. Our theoretical analysis reveals that there are tractable special cases where the problem can be reduced to an optimal stopping problem. We derive approximate strategies of high quality by relaxing the general problem to an optimal stopping problem, and show both theoretically and experimentally that it not only pays off to pursue luck in the execution of the optimal policy, but that there are even cases where it is better to be lucky than good as the execution of a suboptimal base policy is part of an optimal strategy in the meta-MDP.


Optimal Stopping Problem; Secretary Problem; MDP; Planning under Uncertainty; IPPC; UCT

Full Text: PDF