Jeffrey L. Stimpson and Michael A. Goodrich
Learning in many multi-agent settings is inherently repeated play. This calls into question the naive application of single play Nash equilibria in multi-agent learning and suggests, instead, the application of give-andtake principles of bargaining. We modify and analyze a satisficing algorithm based on (Karandikar et al., 1998) that is compatible with the bargaining perspective. This algorithm is a form of relaxation search that converges to a satisficing equilibrium without knowledge of game payoffs or other agents’ actions. We then develop an M action, N player social dilemma that encodes the key elements of the Prisoner’s Dilemma. This game is instructive because it characterizes social dilemmas with more than two agents and more than two choices. We show how several different multi-agent learning algorithms behave in this social dilemma, and demonstrate that the satisficing algorithm converges, with high probability, to a Pareto efficient solution in self play and to the single play Nash equilibrium against selfish agents. Finally, we present theoretical results that characterize the behavior of the algorithm.