Enzo Mumolo, Massimiliano Nolich
In this paper we present a novel artificial auditory system for humanoid robots. We address the problem of estimating an articulatory representation of the speech of the talker who is speaking to the robot using our auditory system. According to the motor theory of perception, the articulatory representation is the first step of a robust speech understanding process. The system is composed by two parts, namely a beam-forming module and a perception module. The beam-former is two-channel (i.e. dual-microphones) and it is based on the super-directive beam-forming algorithm. The environment is scanned for seeking a sound source; when the direction of the source is found, the reception lobe of the dual-microphone system is steered to that direction and the signal is acquired. The perception module is based on a fuzzy computational model of human vocalization. In summary, the relationships between places of articulation and speech acoustic parameters are represented with fuzzy rules. Starting from the articulatory features, a set of acoustic parameters are generated according to the fuzzy rules. These acoustic parameters are used to generate a synthetic utterance which is compared in the perceptual domain to the corresponding spoken utterance. The goal of that is to estimate the membership degrees of the articulatory features using analysis-by-synthesis and genetic optimization.
Subjects: 19.1 Perception; 17. Robotics