Reinforcement Learning under Threats

Authors

  • Victor Gallego Instituto de Ciencias Matemáticas
  • Roi Naveiro Instituto de Ciencias Matemáticas
  • David Rios Insua Instituto de Ciencias Matemáticas

DOI:

https://doi.org/10.1609/aaai.v33i01.33019939

Abstract

In several reinforcement learning (RL) scenarios, mainly in security settings, there may be adversaries trying to interfere with the reward generating process. However, when non-stationary environments as such are considered, Q-learning leads to suboptimal results (Busoniu, Babuska, and De Schutter 2010). Previous game-theoretical approaches to this problem have focused on modeling the whole multi-agent system as a game. Instead, we shall face the problem of prescribing decisions to a single agent (the supported decision maker, DM) against a potential threat model (the adversary). We augment the MDP to account for this threat, introducing Threatened Markov Decision Processes (TMDPs). Furthermore, we propose a level-k thinking scheme resulting in a new learning framework to deal with TMDPs. We empirically test our framework, showing the benefits of opponent modeling.

Downloads

Published

2019-07-17

How to Cite

Gallego, V., Naveiro, R., & Insua, D. R. (2019). Reinforcement Learning under Threats. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01), 9939-9940. https://doi.org/10.1609/aaai.v33i01.33019939

Issue

Section

Student Abstract Track