A Statistical Approach to Multimodal Natural Language Interaction

John Vergo

The Human-Centric Word Processor is a research prototype that allows users to create, edit and manage documents. Users can use real-time continuous speech recognition to dictate the contents of a document. Speech recognition is coupled with pen or mouse based input to facilitate all aspects of the command and control of the application. The system is multimodal, allowing the user to point and speak simultaneously. In particular, the correction, formatting, organization and manipulation of dictated text are greatly facilitated by the combination of natural language understanding and multimodal input. The system uses a maximum entropy, statistical approach for mapping a combination of natural language and pointing events into multimodal formal language statements.


This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.