Occam’s Razor and a Non-Syntactic Measure of Decision Tree Complexity

Goutam Paul

Occam’s razor, attributed to the fourteenth century English philosopher William of Occam, states: “plurality should not be assumed without necessity.” The machine learning interpretation of Occam’s razor is that if two models have the same performance on the training set, choose the simpler. Decision tree learning widely uses Occam’s razor. Popular decision tree generating algorithms are based on information gain criterion which inherently prefers shorter trees (Mitchel 1997). Furthermore, decision tree pruning is common regardless of the splitting criterion. Experiments suggest that shorter trees indeed have better generalization accuracy (GA), typically estimated by a validation set prediction accuracy. However, some case studies show evidence apparently against Occam’s razor. Recently, Webb (1996) has built C4.5X, a version of C4.5 decision tree classifier (Quinlan 1993) with a postprocessor, which adds more nodes and branches to the tree generated by basic C4.5. He showed that though C4.5 and C4.5X have identical training set accuracies, the generalization accuracy over some datasets is better for C4.5X. But Webb’s argument is based on the traditional syntactic complexity measure (number of nodes) of decision trees. In this paper, we explore a non-syntactic measure of decision tree complexity using the notion of Kolmogorov Complexity (Kolmogorov 1965) and show that in this measure the complexity of C4.5X tree is less than that of C4.5 tree on average. Hence, according to our measure of complexity, C4.5X does not violate Occam’s razor.

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.