Bruce Krulwich, Chad Burkey
Multi-modal interfaces have been proposed as a way to capture the ease and expressivity of natural communication. Interfaces of this sort allow users to communicate with computers through combinations of speech, gesture, touch, expression, etc. A critical problem in developing such an interface is integrating these different inputs (e.g., spoken sentences, pointing gestures, etc.) into a single interpretation. For example, in combining speech and gesture, a system must relate each gesture to the appropriate part of the sentence. We are investigating this problem as it arises in our talk and touch interfaces, which combine full-sentence speech and screen-touching. Our solution, which has been implemented in two completed prototypes, uses multi-modal semantic grammars to match screen touches to speech utterances. Through this mechanism, our systems can easily support wide variations in the speech patterns used to indicate touch references. Additionally, they can ask specific focused questions to the user in the event of an inability to understand the input. They earl also incorporate other semantic information, such as contextual references or references to previous sentence referents, through this single unified approach. Our two prototypes appear effective in providing a straightforward and powerful interface to novice computer users.