Xingquan Zhu, Xindong Wu, Taghi Taghi Khoshgoftaar, Shi Yong
In this paper, we perform an empirical study of the impact of noise on cost-sensitive (CS) learning, through observations on how a CS learner reacts to the mislabeled training examples in terms of mis-classification cost and classification accuracy. Our empirical results and theoretical analysis indicate that mislabeled training examples can raise serious concerns for cost-sensitive classification, especially when misclassifying some classes becomes ex-tremely expensive. Compared to general inductive learning, the problem of noise handling and data cleansing is more crucial, and should be carefully investigated to ensure the success of CS learning.
Subjects: 12. Machine Learning and Discovery; 15.6 Decision Trees
Submitted: Oct 13, 2006