Maarten Peeters, Katja Verbeeck, and Ann Nowé
Coordination to some equilibrium point is an interesting problem in multi-agent reinforcement learning. In common interest single stage settings this problem has been studied profoundly and efficient solution techniques have been found. Also for particular multi-stage games some experiments show good results. However, for a large scale of problems the agents do not share a common pay-off function. Again, for single stage problems, a solution technique exists that finds a fair solution for all agents. In this paper we report on a technique that is based on learning automata theory and periodical policies. Letting pseudo-independent agents play periodical policies enables them to behave socially in pure conflicting multi-stage games as defined by E. Billard. We experimented with this technique on games where simple learning automata have the tendency not to cooperate or to show oscillating behavior resulting in a suboptimal pay-off. Simulation results illustrate that our technique overcomes these problems and our agents find a fair solution for both agents.