Shehroz Khan, Dr. Shri Kant
Clustering accuracy of partitional clustering al-gorithm for categorical data primarily depends upon the choice of initial data points (modes) to instigate the clustering process. Traditionally ini-tial modes are chosen randomly. As a consequence of that, the clustering results cannot be generated and repeated consistently. In this paper we present an approach to compute initial modes for K-mode clustering algorithm to cluster cate-gorical data sets. Here, we utilize the idea of Evi-dence Accumulation for combining the results of multiple clusterings. Initially, n F − dimensional data is decomposed into a large number of com-pact clusters; the K-modes algorithm performs this decomposition, with several clusterings ob-tained by N random initializations of the K-modes algorithm. The modes thus obtained from every run of random initializations are stored in a Mode-Pool, PN. The objective is to investigate the contribution of those data objects/patterns that are less vulnerable to the choice of random selection of modes and to choose the most di-verse set of modes from the available Mode-Pool that can be utilized as initial modes for the K-mode clustering algorithm. Experimentally we found that by this method we get initial modes that are very similar to the actual/desired modes and gives consistent and better clustering results with less variance of clustering error than the traditional method of choosing random modes.
Subjects: 12. Machine Learning and Discovery; 1. Applications
Submitted: Oct 11, 2006