Sandip Sen and Neeraj Arora
Agents that learn about other agents and can exploit this information possess a distinct advantage in competitive situations. Games provide stylized adversarial environments to study agent learning strategies. Researchers have developed game playing programs that learn to play better from experience. We have developed a learning program that does not learn to play better, but learns to identify and exploit the weaknesses of a particular opponent by repeatedly playing it over several games. We propose a scheme for learning opponent action probabilities and a utility maximization framework that exploits this learned opponent model. We show that the proposed expected utility maximization strategy generalizes the traditional maximin strategy, and allows players to benefit by taking calculated risks that are avoided by the maximin strategy. Experiments in the popular board game of Connect-4 show that a learning player consistently outperforms a non-learning player when pitted against another automated player using a weaker heuristic. Though our proposed mechanism does not improve the skill level of a computer player, it does improve its ability to play more effectively against a weaker opponent.