Julien Laumônier and Brahim Chaib-draa
Some game theory approaches to solve multiagent reinforcement learning in self play, i.e. when agents use the same algorithm for choosing action, employ equilibriums, such as the Nash equilibrium, to compute the policies of the agents. These approaches have been applied only on simple examples. In this paper, we present an extended version of Nash Q-Learning using the Stackelberg equilibrium to address a wider range of games than with the Nash Q-Learning. We show that mixing the Nash and Stackelberg equilibriums can lead to better rewards not only in static games but also in stochastic games. Moreover, we apply the algorithm to a real world example, the automated vehicle coordination problem.