Poe Xing, Casimir Kulikowski, Ilya Muchnik, Inna Dubchak, Denise M. Wolf, Sylvia Spengler, and Manfred Zorn
We present an analysis of multi-aligned eukaryotic and procaryotic small subunit rRNA sequences using a novel segmentation and clustering procedure capable of extracting subsets of sequences that share common sequence features. This procedure consists of: i) segmentation of aligned sequences using a dynamic programming procedure, and subsequent identification of likely conserved segments; ii) for each putative conserved segment, extraction of a locall homogeneous cluster using a novel polynomial procedure; and iii) intersection of clusters associated with each conserved segment. Aside from their utilit in processing large gap-filled multi-alignments, these algorithms can be applied to a broad spectrum of rRNA analysis functions such as subalignment, phylogenetic subtree extraction and construction, and organism tree-placement, and can serve as a framework to organize sequence data in an efficient and easily searchable manner. The sequence classification we obtained using the method presented here shows a remarkable consistency with the independently constructed eukaryotic phylogenetic tree.