Q-Decomposition for Reinforcement Learning Agents

Authors

Stuart Russell and Andrew L. Zimdars

Proceedings:

Proceedings of the Twentieth International Conference on Machine Learning

Volume

Issue:

Proceedings of the Twentieth International Conference on Machine Learning

Track:

Contents

Downloads:

Download PDF

Abstract:

The paper explores a very simple agent design method called Q-decomposition, wherein a complex agent is built from simpler subagents. Each subagent has its own reward function and runs its own reinforcement learning process. It supplies to a central arbitrator the Q-values (according to its own reward function) for each possible action. The arbitrator selects an action maximizing the sum of Q-values from all the subagents. This approach has advantages over designs which subagents recommend actions. It also has the property that if each subagent runs the Sarsa reinforcement learning algorithm to learn its local Q-function, then a globally optimal policy achieved. (On the other hand, local Q-learning leads to globally suboptimal behavior.) In some cases, this form of agent decomposition allows the local Q-functions to be expressed by muchreduced state and action spaces. These results are illustrated in two domains that require effective coordination of behaviors.

ICML

Proceedings of the Twentieth International Conference on Machine Learning

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.