Selecting Minority Examples from Misclassified Data for Over-Sampling

Jorge de la Calleja, Olac Fuentes, Jesus Gonzalez

We introduce a method to deal with the problem of learning from imbalanced data sets, where examples of one class significantly outnumber examples of other classes. Our method selects minority examples from misclassified data given by an ensemble of classifiers. Then, these instances are over-sampled to create new synthetic examples using a variant of the well-known SMOTE algorithm. To build the ensemble we use the bagging method and locally weighted linear regression as the machine learning algorithm. We tested our method using several data sets from the UCI machine learning repository. Our experimental results show that our approach obtains very good results, in fact it showed better recall and precision than SMOTE.

Subjects: 12. Machine Learning and Discovery; 1. Applications

Submitted: Feb 25, 2008

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.