Ted Briscoe and Nick Waegner
Development of a robust syntactic parser capable of returning the unique, correct and syntactically determi: hate analysis for arbitrary naturally-occurring input will require solutions to two critical problems with most, if not all, current wide-coverage parsing systems; namely, resolution of structural ambiguity and undergeneration. Typically, resolution of syntactic ambiguity has been conceived as the problem of representing and deploying non-syntactic (semantic, pragmatic, phonological) knowledge. However, this approach has not proved fruitful so far except for small and simple domains and even in these cases remains labour intensive. In addition, some naturally-occurring sentences will not be correctly analysed (or analysed at all) by a parser deploying a generative grammar based on the assumption that the grammatical sentences of a natural language constitute a wellformed set. Little attention has been devoted to this latter problem; however, the increasing quantities of machine-readable text requiring linguistic classification both for purposes of research and information retrieval, make it increasingly topical. In this paper, we discuss the application of the Viterbi algorithm and the Baum-Welch algorithm (in wide use for speech recognition) to the parsing problem and describe a recent experiment designed to produce a simple, robust, stochastic parser which selects an appropriate analysis frequently enough to be useful and deals effectively with the problem of undergeneration. We focus on the application of these stochastic algorithms here because, although other statistically based approaches have been proposed, these appear most promising as they are computationallytractable (in principle) and well-integrated with formal language / automata theory.