James R. Curran and Raymond K. Wong
Recent work in computational linguistics has described a transformation-based learner with impressive accuracy, speed and a lucid, concise representation. This work presents a set-based formal model of ambiguity, tagging and the transformationbased learning paradigm. We apply the model to the automatic learning of document format generation and recognition on multiple levels of structural semantics. This supports general applicability of the model and results in a novel linear time document format processor.