Bootstrap Estimated Uncertainty of the Environment Model for Model-Based Reinforcement Learning
Model-based reinforcement learning (RL) methods attempt to learn a dynamics model to simulate the real environment and utilize the model to make better decisions. However, the learned environment simulator often has more or less model error which would disturb making decision and reduce performance. We propose a bootstrapped model-based RL method which bootstraps the modules in each depth of the planning tree. This method can quantify the uncertainty of environment model on different state-action pairs and lead the agent to explore the pairs with higher uncertainty to reduce the potential model errors. Moreover, we sample target values from their bootstrap distribution to connect the uncertainties at current and subsequent time-steps and introduce the prior mechanism to improve the exploration efficiency. Experiment results demonstrate that our method efficiently decreases model error and outperforms TreeQN and other stateof-the-art methods on multiple Atari games.