David H. Wolpert and Kagan Tumer
We consider the design of multi-agent systems (MAS) so as to optimize an overall world utility function when each agent in the system runs a Reinforcement Learning (RL) algorithm based on own its private utility function. Traditional game theory deals with the "forward problem" of determining the state of a MAS that will ensue from a specified set of private utilities of the individual agents. Accordingly, it can be used to predict what world utility would be induced by any such set of private utilities if each agent tried to optimize its utility by using RL algorithms (under appropriate assumptions concerning rationality of those algorithms, information sets, etc.) In this work we are interested instead in the inverse problem, of how to design the private utilities to induce as high a value of world utility as possible. To ground the analysis in the real world, we investigate this problem in the context of minimizing the loss of importance-weighted communication data traversing a constellation of communication satellites. In our scenario the actions taken by the agents are the introduction of virtual "ghost" traffic into the decision-making of a (pre-fixed, non-learning) distributed routing algorithm. The idea is that judiciously chosen, such ghost traffic can "mislead" the routing algorithm in a way that overcomes deficiencies in that algorithm and thereby improves global performance. The associated design problem is to determine private utilities for the agents that will lead them to introduce precisely that desired ghost traffic. We show in a set of computer experiments that by using inverse game theory it is possible to solve this design problem, i.e., to assign private utilties that lead the agents to introduce ghost traffic that does indeed improve global performance.