Jacob Eisenstein, Regina Barzilay, Randall Davis
Coverbal gesture provides a channel for the visual expression of ideas. While some gestural emblems have culturally predefined forms (e.g., "thumbs up"), the relationship between gesture and meaning is, in general, not conventionalized. It is natural to ask whether such gestures can be interpreted in a speaker-independent way, or whether gestural form is determined by the speaker’s idiosyncratic view of the discourse topic. We address this question using an audiovisual dataset across multiple speakers and topics. Our analysis employs a hierarchical Bayesian author-topic model, in which gestural patterns are stochastically generated by a mixture of speaker-specific and topic-specific priors. These gestural patterns are characterized using automatically extracted visual features, based on spatio-temporal interest points. This framework detects significant cross-speaker patterns in gesture that are governed by the discourse topic, suggesting that even unstructured gesticulation can be interpreted across speakers. In addition, the success of this approach shows that the semantic characteristics of gesture can be detected via a low-level, interest point representation.
Subjects: 13. Natural Language Processing; 13.1 Discourse
Submitted: Apr 15, 2008