Raghav Aras, Alain Dutech, François Charpillet
We consider the problem of finding an n-agent joint-policy for the optimal finite-horizon control of a decentralized Pomdp (Dec-Pomdp). This is a problem of very high complexity (NEXP-hard in n ≥ 2). In this paper, we propose a new mathematical programming approach for the problem. Our approach is based on two ideas: First, we represent each agent's policy in the sequence-form and not in the tree-form, thereby obtaining a very compact representation of the set of joint-policies. Second, using this compact representation, we solve this problem as an instance of combinatorial optimization for which we formulate a mixed integer linear program (MILP). The optimal solution of the MILP directly yields an optimal joint-policy for the Dec-Pomdp. Computational experience shows that formulating and solving the MILP requires significantly less time to solve benchmark Dec-Pomdp problems than existing algorithms. For example, the multi-agent tiger problem for horizon 4 is solved in 72 secs with the MILP whereas existing algorithms require several hours to solve it.
Subjects: 7.1 Multi-Agent Systems; 15.5 Decision Theory
Submitted: Jun 27, 2007