Determinantal Reinforcement Learning

Takayuki Osogami; Rudy Raymond

doi:10.1609/aaai.v33i01.33014659

Authors

Takayuki Osogami IBM Research - Tokyo
Rudy Raymond IBM Research - Tokyo

DOI:

https://doi.org/10.1609/aaai.v33i01.33014659

Abstract

We study reinforcement learning for controlling multiple agents in a collaborative manner. In some of those tasks, it is insufficient for the individual agents to take relevant actions, but those actions should also have diversity. We propose the approach of using the determinant of a positive semidefinite matrix to approximate the action-value function in reinforcement learning, where we learn the matrix in a way that it represents the relevance and diversity of the actions. Experimental results show that the proposed approach allows the agents to learn a nearly optimal policy approximately ten times faster than baseline approaches in benchmark tasks of multi-agent reinforcement learning. The proposed approach is also shown to achieve the performance that cannot be achieved with conventional approaches in partially observable environment with exponentially large action space.

Determinantal Reinforcement Learning

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription