Error-Based and Entropy-Based Discretization of Continuous Features

Ron Kohavi, Mehran Sahami

We present a comparison of error-based and entropy-based methods for discretization of continuous features. Our study includes both an extensive empirical comparison as well as an analysis of scenarios where error minimization may be an inappropriate discretization criterion. We present a discretization method based on the C4.5 decision tree algorithm and compare it to an existing entropy-based discretization algorithm, which employs the Minimum Description Length Principle, and a recently proposed error-based technique. We evaluate these discretization methods with respect to C4.5 and Naive-Bayesian classifiers on datasets from the UC1 repository and analyze the computational complexity of each method. Our results indicate that the entropy-based MDL heuristic outperforms error minimization on average. We then analyze the shortcomings of error-based approaches in comparison to entropy-based methods.


This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.