Pedro Domingos, University of Washington
The bias-variance decomposition is a very useful and widely-used tool for understanding machine-learning algorithms. It was originally developed for squared loss. In recent years, several authors have proposed decompositions for zero-one loss, but each has significant shortcomings. In particular, all of these decompositions have only an intuitive relationship to the original squared-loss one. In this paper, we define bias and variance for an arbitrary loss function, and show that the resulting decomposition specializes to the standard one for the squared-loss case, and to a close relative of Kong and Dietterich’s (1995) one for the zero-one case. The same decomposition also applies to variable misclassification costs. We show a number of interesting consequences of the unified definition. For example, Schapire et al.’s (1997) notion of margin can be expressed as a function of the zero-one bias and variance, making it possible to formally relate a classifier ensemble’s generalization error to the base learner’s bias and variance on training examples. Experiments with the unified definition lead to further insights.