Forward Search Value Iteration for POMDPs

Guy Shani, Ronen I. Brafman, Solomon E. Shimony

Recent scaling up of POMDP solvers towards realistic applications is largely due to point-based methods which quickly converge to an approximate solution for medium-sized problems. Of this family HSVI, which uses trial-based asynchronous value iteration, can handle the largest domains. In this paper we suggest a new algorithm, FSVI, that uses the underlying MDP to traverse the belief space towards rewards, finding sequences of useful backups, and show how it scales up better than HSVI on larger benchmarks.

Subjects: 12.1 Reinforcement Learning; 1.11 Planning

Submitted: Oct 15, 2006


This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.