Sholom M. Weiss, Nitin Indurkhya
We evaluate the performance of weakest-link pruning of decision trees using cross-validation. This technique maps tree pruning into a problem of tree selection: Find the best (i.e. the right-sized) tree, from a set of trees ranging in size from the unpruned tree to a null tree. For samples with at least 200 cases, extensive empirical evidence supports the following conclusions relative to tree selection: (a) 10-fold cross-validation is nearly unbiased; (b) not pruning a covering tree is highly biased; (c) 10-fold cross-validation is consistent with optimal tree selection for large sample sizes and (d) the accuracy of tree selection by 10-fold cross-validation is largely dependent on sample size, irrespective of the population distribution.