Jason D. Williams
A common problem for real-world POMDP applications is how to incorporate expert knowledge and constraints such as business rules into the optimization process. This paper describes a simple approach created in the course of developing a spoken dialog system. A POMDP and conventional handcrafted dialog controller run in parallel; the conventional dialog controller nominates a set of one or more actions, and the POMDP chooses the optimal action. This allows designers to express real-world constraints in a familiar manner, and also prunes the search space of policies. The method naturally admits compression, and the POMDP value function can draw on features from both the POMDP belief state and the hand-crafted dialog controller. The method has been used to build a full-scale dialog system which is currently running at AT&T Labs. An evaluation shows that this unified architecture yields better performance than using a conventional dialog manager alone, and also demonstrates an improvement in optimization speed and reliability vs. a pure POMDP.
Subjects: 12.1 Reinforcement Learning; 6. Computer-Human Interaction
Submitted: Apr 29, 2008