T. R. Ioerger, L. Rendell, and S. Surbramaniam
To date, the only methods that have been used successfully to predict protein structures have been based on identifying homologous proteins whose structures are known. However, such methods are limited by the fact that some proteins have similar structure but no significant sequence homology. We consider two ways of applying machine learning to facilitate protein structure prediction. We argue that a straightforward approach will not be able to improve the accuracy of classification achieved by clustering by alignment scores alone. In contrast, we present a novel constructive induction approach that learns better representations of amino acid sequences in terms of physical and chemical properties. Our learning method combines knowledge and search to shift the representation of sequences so that semantic similarity is more easily recognized by syntactic matching. Our approach promises not only to find new structural relationships among protein sequences, but also expands our understanding of the roles knowledge can play in learning via experience in this challenging domain.