Daniel Szer, Francois Charpillet
We introduce point-based dynamic programming (DP) for decentralized partially observable Markov decision processes (DEC-POMDPs), a new discrete DP algorithm for planning strategies for cooperative multi-agent systems. Our approach makes a connection between optimal DP algorithms for partially observable stochastic games, and point-based approximations for single-agent POMDPs. We show for the first time how relevant multi-agent belief states can be computed. Building on this insight, we then show how the linear programming part in current multi-agent DP algorithms can be avoided, and how multi-agent DP can thus be applied to solve larger problems. We derive both an optimal and an approximated version of our algorithm, and we show its efficiency on test examples from the literature.
Subjects: 7.1 Multi-Agent Systems; 15.4 Reactive Control