Zhiwu Lu, Yuxin Peng, Jianguo Xiao
This paper presents a fast simulated annealing framework for combining multiple clusterings (i.e. clustering ensemble) based on some measures of agreement between partitions, which are originally used to compare two clusterings (the obtained clustering vs. a ground truth clustering) for the evaluation of a clustering algorithm. Though we can follow a greedy strategy to optimize these measures as objective functions of clustering ensemble, some local optima may be obtained and simultaneously the computational cost is too large. To avoid the local optima, we then consider a simulated annealing optimization scheme that operates through single label changes. Moreover, for these measures between partitions based on the relationship (joined or separated) of pairs of objects such as Rand index, we can update them incrementally for each label change, which makes sure the simulated annealing optimization scheme is computationally feasible. The simulation and real-life experiments then demonstrate that the proposed framework can achieve superior results.
Subjects: 12. Machine Learning and Discovery; 9.3 Mathematical Foundations
Submitted: Apr 10, 2008