Are Strong Policies Also Good Playout Policies? Playout Policy Optimization for RTS Games

Authors

  • Zuozhi Yang Drexel University
  • Santiago Ontañón Drexel University

DOI:

https://doi.org/10.1609/aiide.v16i1.7423

Abstract

Monte Carlo Tree Search has been successfully applied to complex domains such as computer Go. However, despite its success in building game-playing agents, there is little understanding of general principles to design or learn its playout policy. Many systems, such as AlphaGo, use a policy optimized to mimic human expert as the playout policy. But are strong policies good playout policies? In this paper, we take a case study in real-time strategy games. We use bandit algorithms to optimize stochastic policies as both gameplay policies and playout policies for MCTS in the context of RTS games. Our results show that strong policies do not make the best playout policies, and that policies that maximize MCTS performance as playout policies are actually weak in terms of gameplay strength

Downloads

Published

2020-10-01

How to Cite

Yang, Z., & Ontañón, S. (2020). Are Strong Policies Also Good Playout Policies? Playout Policy Optimization for RTS Games. Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, 16(1), 144-150. https://doi.org/10.1609/aiide.v16i1.7423