A Generalized Idiom Usage Recognition Model Based on Semantic Compatibility
Many idiomatic expressions can be used figuratively or literally depending on the context. A particular challenge of automatic idiom usage recognition is that idioms, by their very nature, are idiosyncratic in their usages; therefore, most previous work on idiom usage recognition mainly adopted a “per idiom” classifier approach, i.e., a classifier needs to be trained separately for each idiomatic expression of interest, often with the aid of annotated training examples. This paper presents a transferred learning approach for developing a generalized model to recognize whether an idiom is used figuratively or literally. Our work is based on the observation that most idioms, when taken literally, would be somehow semantically at odds with their context. Therefore, a quantified notion of semantic compatibility may help to discern the intended usage for any arbitrary idiom. We propose a novel semantic compatibility model by adapting the training of a Continuous Bag-of-Words (CBOW) model for the purpose of idiom usage recognition. There is no need to annotate idiom usage examples for training. We perform evaluative experiments on two corpora; results show that the proposed generalized model achieves competitive results compared to state of-the-art per-idiom models.