Usama M. Fayyad, Keki B. Irani
We address the problem of selecting an attribute and some of its values for branching during the top-down generation of decision trees. We study the class of impurity measures, members of which are typically used in the literature for selecting attributes during decision tree generation (e.g. entropy in ID3, GID3*, and CART; Gini Index in CART). We argue that this class of measures is not particularly suitable for use in classification learning. We define a new class of measures, called C-SEP, that we argue is better suited for the purposes of class separation. A new measure from C-SEP is formulated and some of its desirable properties are shown. Finally, we demonstrate empirically that the new algorithm, O-BTree, that uses this measure indeed produces better decision trees than algorithms that use impurity measures.