My thesis aims to contribute towards building autonomous agents that are able to understand their surrounding environment through the use of both audio and visual information. To capture a more complete description of a scene, the fusion of audio and visual information can be advantageous in enhancing the system’s context awareness. The goal of this work is on the characterization of unstructured environmental sounds for understanding and predicting the context surrounding of an agent. Most research on audio recognition has focused primarily on speech and music. Less attention has been paid to the challenges and opportunities for using audio to characterize unstructured environments. Unlike speech and music, which have formantic structures and harmonic structures, environmental sounds are considered unstructured since they are variably composed from different sound sources. My research will investigate challenging issues in characterizing environmental sounds such as the development of appropriate features extraction algorithm and learning techniques for modeling the dynamics of the environment. A final aspect of my research will consider the decision making of an autonomous agent based on the fusion of visual and audio information.
Subjects: 12. Machine Learning and Discovery; 17. Robotics
Submitted: Apr 10, 2008