Norihiko Ono and Kenji Fukumoto, University of Tokushima, Japan
To investigate the potentials and limitation of multi-agent reinforcement learning, several attempts have been made to let multiple monolithic reinforcement-learning agents synthesize coordinated decision policies needed to accomplish their common goals effectively. Most of these straightforward reinforcement-learning approaches, however, scale poorly to more complex multi-agent learning problems, because the state space for each learning agent grows exponentially in the number of its partner agents engaged in the joint task. In this paper, we consider a modified version of the Pursuit Problem as a multi-agent learning problem which is computationally intractable by these straightforward approaches. We show how successfully a collection of modular Q-learning hunter agents synthesize coordinated decision policies needed to capture a randomly-fleeing prey agent effectively, by specializing their individual functionality and synthesizing herding behavior.