K. M. Ho, P. D. Scott
This paper introduces a new technique for discretization of continuous variables based on zeta, a measure of strength of association between nominal variables developed for this purpose. Zeta is defined as the maximal accuracy achievable if each value of an independent variable must predict a different value of a dependent variable. We describe both how a continuous variable may be dichotomised by searching for a maximum value of zeta, and how a heuristic extension of this method can partition a continuous variable into more than two categories. Experimental comparisons with other published methods, show that zeta-discretization runs considerably faster than other techniques without any loss of accuracy.