Peng Dai, Mausam, Daniel S. Weld
Dynamic programming methods (including value iteration, LAO*, RTDP, and derivatives) are popular algorithms for solving Markov decision processes (MDPs). Unfortunately, however, these techniques store the MDP model extensionally in a table and thus are limited by the amount of main memory available. Since the required space is exponential in the number of domain features, these dynamic programming methods are ineffective for large problems. To address this problem, Edelcamp et al. devised the external memory value iteration (EMVI) algorithm, which uses a clever sorting scheme to efficiently move parts of the model between disk and main memory. While EMVI can handle larger problems than previously addressed, the need to repeatedly perform external sorts still limits scalability. This paper proposes a new approach. We partition an MDP into smaller pieces (blocks), keeping just the relevant blocks in memory and performing Bellman backups block by block. Experiments show that our algorithm is able to solve large MDPs an order of magnitude faster than EMVI.
Subjects: 1.11 Planning; Please choose a second document classification
Submitted: Apr 16, 2008