Rohini K. Srihari, Adrian Novischi
This research explores the interaction of textual and visual information in video indexing and searching. Much of the recent work has focused on machine learning techniques that learn from both text and image/video features, e.g. the text surrounding a photograph on a web page. This is useful in similarity search (i.e. searching by example), but has drawbacks when more semantic search is desired, e.g. find video clips of Obama meeting with ordinary citizens. By extracting key visual semantics from the audio/text accompanying video, we are able to enhance the precision and granularity of video search. Visual semantics relates to identifying and correlating linguistic triggers with visual properties of accompanying video/images. Significant progress has been made in text-based information extraction, which can be brought to bear for video search. In this paper, we focus on linguistic triggers related to a special class of events referred to as nominal events. We describe how proper detection and interpretation of such events can prevent false positives in video searches.
Submitted: Sep 8, 2008