Mark A. Jones and Jason M. Eisner
We describe a general approach to the probabilistic parsing of context-free grammars. The method integrates context-sensitive statistical knowledge of various types (e.g., syntactic and semantic) and can be trained incrementally from a bracketed corpus. We introduce a variant of the GHR contextfree recognition algorithm, and explain how to adapt it for efficient probahilistic parsing. In splitcorpus testing on a real-world corpus of sentences from software testing documents, with 20 possible parses for a sentence of average length, the system finds and identifies the correct parse in 96% of the sentences for which it finds any parse, while producing only 1.03 parses per sentence for those sentences. Significantly, this success rate would be only 79% without the semantic statistics.