Chris Drummond and Robert C. Holte
This paper investigates how the splitting criteria and pruning methods of decision tree learning algorithms are influenced by misclassification costs or changes to the class distribution. Splitting criteria that are relatively insensitive to costs (class distributions) are found to perform as well as or better than, in terms of expected misclassification cost, splitting criteria that are cost sensitive. Consequently there are two opposite ways of dealing with imbalance. One is to combine a costinsensitive splitting criterion with a cost insensitive pruning method to produce a decision tree algorithm little affected by cost or prior class distribution. The other is to grow a cost-independent tree which is then pruned in a cost-sensitive manner.