Finale Doshi, Nicholas Roy
Intelligent planning algorithms such as the Partially Observable Markov Decision Process (POMDP) have succeeded in dialog management applications because they are robust to uncertainties in human-robot interaction. Like all dialog planning systems, POMDPs require an accurate model of what the user might say and how he wishes to interact with the robot. In the POMDP framework, the user's vocabulary and preferences are generally specified using a large probabilistic model with many parameters. While it may be easy for an expert to specify reasonable values for these parameters, gathering data to specify the parameters accurately is expensive. In this paper, we take a Bayesian approach to learning the user model while simultaneously refining the dialog manager's policy. First, we show how to compute the optimal dialog policy with uncertain parameters (in the absence of learning), along with a heuristic that allows the dialog manager to intelligently replan its policy given data from recent interactions. Next, we present a pair of approaches which explicitly consider the robot's uncertainty about the true user model when taking actions; we show these approaches can learn user preferences more robustly. A key contribution of this work is the use of ``meta-actions,'' queries about what the robot should have done, to discover a user's dialog preferences without making mistakes that may potentially annoy the user.
Subjects: 12. Machine Learning and Discovery; 6. Computer-Human Interaction
Submitted: Jan 26, 2007