Jorge de la Calleja, Olac Fuentes, Jesus Gonzalez
We introduce a method to deal with the problem of learning from imbalanced data sets, where examples of one class significantly outnumber examples of other classes. Our method selects minority examples from misclassified data given by an ensemble of classifiers. Then, these instances are over-sampled to create new synthetic examples using a variant of the well-known SMOTE algorithm. To build the ensemble we use the bagging method and locally weighted linear regression as the machine learning algorithm. We tested our method using several data sets from the UCI machine learning repository. Our experimental results show that our approach obtains very good results, in fact it showed better recall and precision than SMOTE.
Subjects: 12. Machine Learning and Discovery; 1. Applications
Submitted: Feb 25, 2008