f-Similarity Preservation Loss for Soft Labels: A Demonstration on Cross-Corpus Speech Emotion Recognition
In this paper, we propose a Deep Metric Learning (DML) approach that supports soft labels. DML seeks to learn representations that encode the similarity between examples through deep neural networks. DML generally presupposes that data can be divided into discrete classes using hard labels. However, some tasks, such as our exemplary domain of speech emotion recognition (SER), work with inherently subjective data, data for which it may not be possible to identify a single hard label. We propose a family of loss functions, fSimilarity Preservation Loss (f-SPL), based on the dual form of f-divergence for DML with soft labels. We show that the minimizer of f-SPL preserves the pairwise label similarities in the learned feature embeddings. We demonstrate the efficacy of the proposed loss function on the task of cross-corpus SER with soft labels. Our approach, which combines f-SPL and classification loss, significantly outperforms a baseline SER system with the same structure but trained with only classification loss in most experiments. We show that the presented techniques are more robust to over-training and can learn an embedding space in which the similarity between examples is meaningful.