Show and Tell: Using Speech Input for Image Interpretation and Annotation

Rohini K Srihari, Zhongfei Zhang and Ranjiv Chopra

This research concerns the exploitation of linguistic context in vision. Linguistic context is qualitative in nature and is obtained dynamically. We view this as a new paradigm which is a golden mean between data driven object detection and site-model based vision. Our solution not only proposes new techniques for using qualitative contextual information, but also efficiently exploits existing image interpretation technology. The design and implementation of a system, ShoweATell, a multimedia system for semi-automated image annotation is discussed. This system, which combines advances in speech recognition, natural language processing and image understanding, is designed to facilitate the work of image analysts (IA). Adaptation of the current prototype to the task of change profiling and change detection is discussed.

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.