Using Validation Sets to Avoid Overfitting in AdaBoost

Tom Bylaner, Lisa Tate

AdaBoost is a well known, effective technique for increasing the accuracy of learning algorithms. However, it has the potential to overfit the training set because its objective is to minimize error on the training set. We demonstrate that overfitting in AdaBoost can be alleviated in a time-efficient manner using a combination of dagging and validation sets. Half of the training set is removed to form the validation set. The sequence of base classifiers, produced by AdaBoost from the training set, is applied to the validation set, creating a modified set of weights. The training and validation sets are switched, and a second pass is performed. The final classifier votes using both sets of weights. We show our algorithm has similar performance on standard datasets and improved performance when classification noise is added.

Subjects: 12. Machine Learning and Discovery

Submitted: Feb 10, 2006

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.