Kazuhiro Nakadai, Kitano Symbiotic Systems Project, ERATO; Hiroshi G. Okuno, Kitano Symbiotic Systems Project, ERATO and Kyoto University; Hiroaki Kitano, Kitano Symbiotic Systems Project, ERATO and Sony Computer Science Laboratories, Inc.
A robot’s auditory perception of the real world should be able to cope with motor and other noises caused by the robot’s own movements in addition to environment noises and reverberation. This paper presents the active direction-pass filter (ADPF) that separates sounds originating from a specified direction detected by a pair of microphones. Thus the ADPF is based on directional processing - a process used in visual processing. The ADPF is implemented by hierarchical integration of visual and auditory processing with hypothetical reasoning of interaural phase difference (IPD) and interaural intensity difference (IID) for each sub-band. The ADPF gives differences in resolution in sound localization and separation depending on where the sound comes from: the resolving power is much higher for sounds coming directly from the front of the humanoid than for sounds coming from the periphery. This directional resolving property is similar to that of the eye whereby the visual fovea at the center of the retina is capable of much higher resolution than is the periphery of the retina. To exploit the corresponding "auditory fovea," the ADPF controls the direction of the head. The human tracking and sound source separation based on the ADPF is implemented on the upper-torso of the humanoid and runs in real-time using distributed processing by 5 PCs networked via a gigabit ethernet. The signal-to-noise ratio (SNR) and noise reduction ratio of each sound separated by the ADPF from a mixture of two or three speeches of the same volume were increased by about 2.2\,dB and 9\,dB, respectively