Representations for Multi-Modal Human-Computer Interaction
Papers from the AAAI Workshop
Syed Ali and Susan McRoy, Cochairs
Representations for processing human communication have, mainly, been concerned with single modalities. Further advances, however, may require taking advantage of the fact that most human communication takes place in more than one modality at the same time.
A core problem in multi-modal human-computer interaction is how the information conveyed via multiple modalities is funneled into and out of a single underlying representation of meaning to be communicated. On the output side, this is the information-to-media allocation problem; on the input side, this is the cross-media information fusion problem.
The aims of this workshop were first, to assess the state of computer representations for understanding human communication in multiple modalities or communicating with humans with multiple media, and second, to encourage collaborative research in developing and using representations that facilitate multi-modal interaction.
Relevant modalities include visual, auditory, olfactory, haptic (touch), kinesthetic (motion/position-sensing), speech, gesture, facial expression, myoelectric signals, and neural inputs. Relevant media include video, text, handwriting, graphics, images, and animation. Proper communication with these modalities and media may be contingent on an underlying set of intentions, such as being informative, deceptive, persuasive, entertaining, affective, social, and so forth.