S. M. Weiss, D. M. Cohen and N. Indurkhya
We consider tile automated identification of transmembrane domains in membrane protein sequences. 324 proteins (containing 1585 segrrmnts) werc examined, representing every protein in the PIR database having the transmembrane domain feature annotation. Machine learning techniques were used to evaluate the efficacy of alternative hydrophobieity measures and windowing techniques. We describe a simpler measure of taydrophobicity and a new variable window size concept. We demonstrate that these techniques are superior to some previous techniques in minimizing the segment error rate. Using these new techniques: we describe an algorithm that has a 7.9% segment error rate on the sampled proteins, while classifying 16.7% of the anfino acid residues as transmembrane.