R. Peter Bonasso, Eric Huber, and David Kortenkamp
In this paper we describe a stereo vision system that can recognize natural gestures and an intelligent robot control architecture that can interpret these gestures in the context of a task. The intelligent control architecture consists of three layers. The top layer deliberates via a state-based, hierarchical, non-linear planner. The bottom layer consists of a suite of reactive skills that can be configured into synchronous state machines by the middle layer. The middle layer, implemented in the RAPs system, serves to mediate between the long range deliberation of the top layer and the continuous activity of the bottom layer. It is in the middle layer that we are investigating context focused deictic gesturing for human-robot interaction. When directed in the context of different RAPs, human gestures can be interpreted by the system differently for different tasks. This work shows that a) our architecture, designed to support perception and action, can also support other forms of communication, and b) task contexts can act as resources for communication by simplifying the interpretation of communicative acts.