Wei-Hao Lin, Rong Jin, and Alexander Hauptmann
Our personal conversation memory agent is a wearable experience collection system, which unobtrusively records the wearer’s conversation, recognizes the face of the dialog partner and remembers his/her voice. When the system sees the same person’s face or hears the same voice it uses a summary of the last conversation with this person to remind the wearer. To correctly identify a person and help remember the earlier conversation, the system must be aware of the current situation, as analyzed from audio and video streams, and classify the situation by combining these modalities. Multimodal classifiers, however, are relatively unstable in the uncontrolled real word environments, and a simple linear interpolation of multiple classification judgments cannot effectively combine multimodal classifiers. We propose a meta-classification strategy using a Support Vector Machine as a new combination strategy. Experimental results show that combining face recognition and speaker identification by meta-classification is dramatically more effective than a linear combination. This meta-classification approach is general enough to be applied to any situation-aware application that needs to combine multiple classifiers.