William Pentney, Marina Meila
In this paper, we apply spectral techniques to clustering biological sequence data that has proved more difficult to cluster effectively. For this purpose, we have to (1) extend spectral clustering algorithms to deal with asymmetric affinities, like the alignment scores used in the comparison of biological sequences, and (2) devise a hierarchical algorithm that can handle many clusters with imbalanced sizes robustly. We present an algorithm for clustering asymmetric affinity data, and demonstrate the performance of this algorithm at recovering the higher levels of the Structural Classification of Proteins (SCOP) on a data base of highly conserved subsequences.
Content Area: 12. Machine Learning
Subjects: 12. Machine Learning and Discovery; 12.2 Scientific Discovery
Submitted: May 10, 2005