Tom Armstrong, Tim Oates
Children are facile at both discovering word boundaries and using those words to build higher-level structures in tandem. Current research treats lexical acquisition and grammar induction as two distinct tasks. Doing so has led to unreasonable assumptions. Existing work in grammar induction presupposes a perfectly segmented, noise-free lexicon, while lexical learning approaches largely ignore how the lexicon is used. This paper combines both tasks in a novel framework for bootstrapping lexical acquisition and grammar induction. We present an algorithm that iteratively learns a lexicon and a grammar for a class of regular languages in polynomial time, and we report experimental results for benchmark languages.
Subjects: 12. Machine Learning and Discovery; 13. Natural Language Processing
Submitted: Feb 21, 2008