Exploiting Diversity for Natural Language Processing

John C. Henderson

The recent popularity of applying machine learning methods to computational linguistics problems has given rise to a large supply of trainable natural language processing systems. Most problems of interest have an array of off-the-shelf products or downloadable code implementing solutions of varying quality using varying techniques. The task this thesis is concerned with is developing reasonable methods for combining the outputs of a diverse set of systems which all address the same task. The hope is that if the set has a high enough initial accuracy and independently assorted errors, we can produce a more accurate solution using an appropriate combining method. In addition, there are principles that initial system developers should keep in mind which will help them create a family of diverse systems. We are also interested in developing methods for increasing the diversity of a set of systems without sacrificing accuracy, for the sake of fruitful combination. Each task we approach will warrant a separate investigation into how to combine outputs, and hope lies in discovering the principles that are common among all tasks. We don’t want to study just one learning method or task, instead we want to discover principles that can be applied universally, or to a broad class of problems.


This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.