Nitesh Chawla, David Cieslak
Decision trees, a popular choice for classification, have their limitation in providing good quality probability estimates. Typically, smoothing methods such as Laplace or m-estimate are applied at the decision tree leaves to overcome the systematic bias introduced by the frequency-based estimates. An ensemble of decision trees has also been shown to help in reducing the bias and variance in the leaf estimates, resulting in better calibrated probabilistic predictions. In this work, we evaluate the calibration or quality of these estimates using various loss measures. We also examine the relationship between the quality of such estimates and resulting rank-ordering of test instances. Our results quantify the impact of smoothing in terms of the loss measures, and the coupled relationship with the AUC measure.
Subjects: 12. Machine Learning and Discovery; 15.6 Decision Trees
Submitted: May 25, 2006