Mark G. Core
We have been annotating TRAINS dialogs with dialog acts in order to produce training data for a dialog act predictor, and to study how language is used in these dialogs. We are using DAMSL dialog acts which consist of 15 independent attributes. For the purposes of this paper, infrequent attributes such as Unintelligible and Self-Talk were set aside to concentrate on the eight major DAMSL tag sets. For five of these eight tag sets, hand constructed decision trees (based solely on the previous utterance’s DAMSL tags) did better than always guessing the most frequent DAMSL tag values. This result suggests that it is possible to automatically build such decision trees especially if other sources of context are added. Our initial efforts to address our second goal (studying language use in the TRAINS dialogs) consist of measuring DAMSL tag cooccurrences and bigrams. Some interesting patterns have emerged from this simple analysis such as the fact that signaling non-understanding is often done through questions. These patterns suggest that we should also be considering an n-gram dialog act model for use in predicting DAMSL tags.