Michael R. James, Satinder Singh
Models of dynamical systems based on predictive state representations (PSRs) use predictions of future observations as their representation of state. A main departure from traditional models such as partially observable Markov decision processes (POMDPs) is that the PSR-model state is composed entirely of observable quantities. PSRs have recently been extended to a class of models called memory-PSRs (mPSRs) that use both memory of past observations and predictions of future observations in their state representation. Thus, mPSRs preserve the PSR-property of the state being composed of observable quantities while potentially revealing structure in the dynamical system that is not exploited in PSRs. In this paper, we demonstrate that the structure captured by mPSRs can be exploited quite naturally for stochastic planning based on value-iteration algorithms. In particular, we adapt the incremental-pruning (IP) algorithm defined for planning in POMDPs to mPSRs. Our empirical results show that our modified IP on mPSRs outperforms, in most cases, IP on both PSRs and POMDPs.
Content Area: 13. Markov Decision Processes & Uncertainty
Subjects: 12.1 Reinforcement Learning; 3.4 Probabilistic Reasoning
Submitted: May 10, 2005