Peter McCracken and Michael Bowling
Research in opponent modelling has shown success, but a fundamental question has been overlooked: what happens when a modeller is faced with an opponent that cannot be successfully modelled? Many opponent modellers could do arbitrarily poorly against such an opponent. In this paper, we aim to augment opponent modelling techniques with a method that enables models to be used safely. We introduce epsilon-safe strategies, which bound by epsilon the possible loss versus a safe value. We also introduce the Safe Policy Selection algorithm (SPS) as a method to vary epsilon in a controlled fashion. We prove in the limit that an agent using SPS is guaranteed to attain at least a safety value in the cases when the opponent modelling is ineffective. We also show empirical evidence that SPS does not adversely affect agents that are capable of modelling the opponent. Tests with a domain of complicated modellers show that SPS is effective at eliminating losses while retaining wins in a variety of modelling algorithms.