Guy Shani, Ronen I. Brafman, Solomon E. Shimony
Recent scaling up of POMDP solvers towards realistic applications is largely due to point-based methods which quickly converge to an approximate solution for medium-sized problems. Of this family HSVI, which uses trial-based asynchronous value iteration, can handle the largest domains. In this paper we suggest a new algorithm, FSVI, that uses the underlying MDP to traverse the belief space towards rewards, finding sequences of useful backups, and show how it scales up better than HSVI on larger benchmarks.
Subjects: 12.1 Reinforcement Learning; 1.11 Planning
Submitted: Oct 15, 2006