Policy makers often confront with the following problem: how best their organization can repeatedly interact with other organizations such that the long-term utility of their organization can be maximized? This problem is difficult because policy makers usually know very little about other organizations, and therefore they cannot make perfect predictions about the other organizations’ behaviors. In this paper, we formulate this problem as social games in which (1) there are two or more agents interacting with each other; (2) each agent can perform more than one action in each interaction; and (3) the payoff matrix is not fixed; the payoff matrix varies from one situation to another. We devised a dynamic programming algorithm to compute a policy given the model of the other agent’s behavior, written in a language called SOMAprograms, a rich language for representing agent’s incomplete belief about the other agents’ behavior.
Subjects: 7.1 Multi-Agent Systems; 12.1 Reinforcement Learning
Submitted: Jun 20, 2008