DBA: Dynamic Multi-Armed Bandit Algorithm

  • Sadegh Nobari Rakuten, Inc.

Abstract

We introduce Dynamic Bandit Algorithm (DBA), a practical solution to improve the shortcoming of the pervasively employed reinforcement learning algorithm called Multi-Arm Bandit, aka Bandit. Bandit makes real-time decisions based on the prior observations. However, Bandit is heavily biased to the priors that it cannot quickly adapt itself to a trend that is interchanging. As a result, Bandit cannot, quickly enough, make profitable decisions when the trend is changing. Unlike Bandit, DBA focuses on quickly adapting itself to detect these trends early enough. Furthermore, DBA remains as almost as light as Bandit in terms of computations. Therefore, DBA can be easily deployed in production as a light process similar to The Bandit. We demonstrate how critical and beneficial is the main focus of DBA, i.e. the ability to quickly finding the most profitable option in real-time, over its stateof-the-art competitors. Our experiments are augmented with a visualization mechanism that explains the profitability of the decisions made by each algorithm in each step by animations. Finally we observe that DBA can substantially outperform the original Bandit by close to 3 times for a set Key Performance Indicator (KPI) in a case of having 3 arms.

Published
2019-07-17
Section
Demonstration Track