Phylogenetic Inference in Protein Superfamilies: Analysis of SH2 Domains

Kimmen Sjölander

This work focuses on the inference of evolutionary relationships in protein superfamilies, and the uses of these relationships to identify key positions in the structure, to infer attributes on the basis of evolutionary distance, and to identify potential errors in sequence annotations. Relative entropy, a distance met- ric from information theory, is used in combination with Dirichlet mixture priors to estimate a phylogenetic tree for a set of proteins. This method infers key structural or functional positions in the molecule, and guides the tree topology to preserve these important positions within subtrees. Minimum-description- length principles are used to determine a cut of the tree into subtrees, to identify the subfamilies in the data. This method is demonstrated on SH2-domain containing proteins, resulting in a new subfamily assignment for Src2 drome and a suggested evolutionary relationship between Nck human and Drk drome, Sem5 caeel, Grb2 human and Grb2 chick.

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.