Efficient No-Regret Multiagent Learning

Bikramjit Banerjee, Jing Peng

We present new results on the efficiency of no-regret algorithms in the context of multiagent learning. We use a known approach to augment a large class of no-regret algorithms to allow stochastic sampling of actions and observation of scalar reward of only the action played. We show that the average actual payoffs of the resulting learner gets (1) close to the best response against (eventually) stationary opponents, (2) close to the asymptotic optimal payoff against opponents that play a converging sequence of policies, and (3) close to at least a dynamic variant of minimax payoff against arbitrary opponents, with a high probability in polynomial time. In addition the polynomial bounds are shown to be significantly better than previously known bounds. Furthermore, we do not need to assume that the learner knows the game matrices and can observe the opponents’ actions, unlike previous work.

Content Area: 1. Agents/Multiagent Systems

Subjects: 7.1 Multi-Agent Systems; 12.1 Reinforcement Learning

Submitted: May 10, 2005

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.