Ramesh Subramonian, Ramana Venkata, and Joyce Chen, Intel Corporation
Discretization is the process of dividing a continuous-valued base attribute into discrete intervals, which highlight distinct patterns in the behavior of a related goal attribute. In this paper, we present an integrated visual framework in which several discretization strategies can be experimented with, and which visually assists the user in intuitively determining the appropriate number and locations of intervals. In addition to featuring methods based on minimizing classification error or entropy, we introduce (i) an optimal algorithm that minimizes the approximation introduced by discretization and (ii) a novel algorithm that uses an unsupervised learning technique, clustering, to identify intervals. We also extend discretization to work with continuous-valued goal attributes.