Principled Multilingual Grammars for Large Corpora

Sharon Flank, Paul Krause, Carol Van Ess-Dykema

In-depth text understanding for large-scale applications requires a broad-coverage, robust grammar. We describe a multilingual implementation of such a grammar, and its advantages over both principle-based parsing and ad-hoc grammar design. We show how X-bar theory and language-independent semantic constraints facilitate grammar development. Our implementation includes innovative handling of (1) syntactic gaps, (2) logical structure alternations, and (3) conjunctions. Each of these innovations enhances performance in both large-scale and multilingual natural language processing applications. Phrase structure grammars are hardly new. The novelty in this paper comes from the use of practical guidelines and real numbers based on our experience with three languages and tens of thousands of texts. The issue of grammar design is worth revisiting because of the increasing bifurcation between semantic phrase grammars on thv one hand, and principle-based parsing in toy domains on the other. Semantic grammars are brittle and must be rewritten for each new domain and language; principle-based parsing is not yet mature enough for our applications. We offer an extensible, multilingual application of the traditional approach that extends theoretical linguistic insights to industrial strength data.

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.