Carlos Guestrin, Shobha Venkataraman, and Daphne Koller
We present a new, principled and efficient algorithm for decision making and planning cooperative multi-agent dynamic systems. We consider systems where the agents’ value function is a sum of local value rules, that specify an increment to the value in certain contexts, which can depend both on the current state and on the actions of some subset of the agents. We show that the task of finding an optimal joint action relative to this type of value function leads to a very natural communication pattern, where agents send messages along a coordination graph determined by the structure of the value rules. We show that the coordination structure depends on the state of the system, and even on the actual numerical values assigned to the value rules. We then show how to apply this framework to the task of multi-agent planning in dynamic systems. We view the entire multi-agent system as a single, large Markov decision process (MDP). We assume that the agents’ reward functions and the system dynamics are described in terms of factored rules. We show how to use an efficient linear programming algorithm to derive a rule-based value function which is an approximation to the optimal joint value function. Given this value function, the agents then apply the coordination graph algorithm at each iteration of the process to decide on a joint action, potentially leading to a different coordination pattern at each step of the plan.