Julian Kupiec and John Maxwell
The paper describes various aspects and practicalities of applying the "Hidden Markov" approach to train parameters of regular and contextfree stochastic grammars. The approach enables grammars to be trained from unlabelled text corpora, providing flexibility in the choice of syntactic categories and text domain. Part-of-speech tagging and parsing are discussed as applications. Linguistic considerations can be used to develop constrained grammars, providing appropriate higher-order context for disamhiguation. Unconstrained grammars provide the opportunity to capture patterns that are not covered by a specific grammar. Experimental results are discussed for these alternatives.