Self-Supervised Mixture-of-Experts by Uncertainty Estimation

Zhuobin Zheng; Chun Yuan; Xinrui Zhu; Zhihui Lin; Yangyang Cheng; Cheng Shi; Jiahui Ye

doi:10.1609/aaai.v33i01.33015933

Authors

Zhuobin Zheng Tsinghua University
Chun Yuan Tsinghua University
Xinrui Zhu Tsinghua University
Zhihui Lin Tsinghua University
Yangyang Cheng Tsinghua University
Cheng Shi Tsinghua University
Jiahui Ye Tsinghua University

DOI:

https://doi.org/10.1609/aaai.v33i01.33015933

Abstract

Learning related tasks in various domains and transferring exploited knowledge to new situations is a significant challenge in Reinforcement Learning (RL). However, most RL algorithms are data inefficient and fail to generalize in complex environments, limiting their adaptability and applicability in multi-task scenarios. In this paper, we propose SelfSupervised Mixture-of-Experts (SUM), an effective algorithm driven by predictive uncertainty estimation for multitask RL. SUM utilizes a multi-head agent with shared parameters as experts to learn a series of related tasks simultaneously by Deep Deterministic Policy Gradient (DDPG). Each expert is extended by predictive uncertainty estimation on known and unknown states to enhance the Q-value evaluation capacity against overfitting and the overall generalization ability. These enable the agent to capture and diffuse the common knowledge across different tasks improving sample efficiency in each task and the effectiveness of expert scheduling across multiple tasks. Instead of task-specific design as common MoEs, a self-supervised gating network is adopted to determine a potential expert to handle each interaction from unseen environments and calibrated completely by the uncertainty feedback from the experts without explicit supervision. To alleviate the imbalanced expert utilization as the crux of MoE, optimization is accomplished via decayedmasked experience replay, which encourages both diversification and specialization of experts during different periods. We demonstrate that our approach learns faster and achieves better performance by efficient transfer and robust generalization, outperforming several related methods on extended OpenAI Gym’s MuJoCo multi-task environments.

Self-Supervised Mixture-of-Experts by Uncertainty Estimation

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription