Predicting Enzyme Function from Sequence: A Systematic Appraisal

Imran Shah and Lawrence Hunter

Homologous proteins do not necessarily exhibit identical biochemical function. Despite this fact, local or global sequence similarity is widely used as an indication of functional identity. Of the 1327 Enzyme Commission defined functional classes with more than one annotated example in the sequence databases, similarity scores alone are inadequate in 251 (19%) of the cases. We test the hypothesis that conserved domains, as defined in the ProDom database, can be used to discriminate between alternative functions for homologous proteins in these cases. Using machine learning methods, we were able to induce correct discriminators for more than half of these 251 challenging functional classes. These results show that the combination of modular representations of proteins with sequence similarity improves the ability to infer function from sequence over similarity scores alone. Keywords: protein function; protein sequence; protein modules; protein function; Enzyme Commission; representation; machine learning

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.