Jacob W. Crandall and Michael A. Goodrich
Many problems in multiagent learning involve repeated play. As such, naive application of Nash equilibrium concepts are often inappropriate. A recent algorithm in the literature uses a Nash bargaining perspective instead of a Nash equilibrium perspective, and learns to cooperate in self play in a social dilemma without exposing itself to being exploited by selfish agents. It does so without knowledge of the game or the actions of other agents. In this paper, we show that this algorithm likely converges to near pareto efficient solutions in self play in most nonzero-sum n-agent, m-action matrix games provided that parameters are set appropriately. Furthermore, we present a tremble based extension of this algorithm and show that it is guaranteed to play near pareto efficient solutions arbitrarily high percentages of the time in self play for the same large class of matrix games while allowing adaptation to changing environments.