Confidence Interval for the Difference in Classification Error

William Elazmeh, Nathalie Japkowicz, Stan Matwin

Evaluating classifiers with increased confidence can significantly impact the success of many machine learning applications. However, traditional machine learning evaluation measures fail to provide any levels of confidence in their results. In this paper, we motivate the need for confidence in classifier evaluation at a level suitable for medical studies. We draw a parallel between case-control medical studies and classification in machine learning. We propose the use of Tango's biostatistical test to compute consistent confidence intervals on the difference in classification errors on both classes. Our experiments compare Tango's confidence intervals to accuracy, recall, precision, and the F measure. Our results show that Tango's test provides a statistically sound notion of confidence and is more consistent and reliable than the above measures.

Subjects: 12. Machine Learning and Discovery

Submitted: May 17, 2006


This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.