Part of Speech Tagging Bilingual Speech Transcripts with Intrasentential Model Switching
Paul Rodrigues, Sandra Kübler

This paper investigates incremental part of speech tagging for speech transcripts that contain multilin- gual intrasentential code-mixing, and compares the accuracy of a monolithic tagging model trained on a heterogeneous-language dataset to a model that switches between two homogeneous-language tagging models dynamically using word-by-word language identification. We find that the dynamic model, even though presented a smaller context consisting of sen- tence fragments, meets the accuracy of the monolithic code-mixing model which is aware of increased context. Our system is modular, and is designed to be expanded to many-language code-mixing.


tagging, part of speech tagging, speech recognition, language identification

