Stephen Sutton, Brian Hansen, Terri Lander, David G. Novick, and Ronald Cole
We present and apply an empirical methodology for evaluating the effectiveness of dialogues in spoken language systems. This methodology is suitable in particular for evaluation of dialogue-based systems that collect information from the user, such as an automated spoken questionnaire. Our method for assessing effectiveness involves coding answers from users for responsiveness. For this effort, we developed a behavioral coding scheme tailored to the requirements of automated spoken questionnaires interacting via the telephone. The codes cover t range of behavior from "Concise" to "No re.,aponse." We have used this evaluation methodology in the development of an automated spoken questionnaire. In connection with this project, we collected over 4,000 telephone calls responding to the questionnaire. A sample of the calls was transcribed and coded using our behavioral coding scheme. We then used the data from the codes to choose among alternative protocols for the dialogue and to evaluate differences in system voice, such as natural versus synthetic and male versus female. In particular, we illuslrate the utility of our methodology by testing the hypothesis that a synthesized system voice would elicit more constrained useresponses than a human voice and report the evaluation results.