Reinforcement Learning under Threats

  • Victor Gallego Instituto de Ciencias Matemáticas
  • Roi Naveiro Instituto de Ciencias Matemáticas
  • David Rios Insua Instituto de Ciencias Matemáticas

Abstract

In several reinforcement learning (RL) scenarios, mainly in security settings, there may be adversaries trying to interfere with the reward generating process. However, when non-stationary environments as such are considered, Q-learning leads to suboptimal results (Busoniu, Babuska, and De Schutter 2010). Previous game-theoretical approaches to this problem have focused on modeling the whole multi-agent system as a game. Instead, we shall face the problem of prescribing decisions to a single agent (the supported decision maker, DM) against a potential threat model (the adversary). We augment the MDP to account for this threat, introducing Threatened Markov Decision Processes (TMDPs). Furthermore, we propose a level-k thinking scheme resulting in a new learning framework to deal with TMDPs. We empirically test our framework, showing the benefits of opponent modeling.

Published
2019-07-17
Section
Student Abstract Track