E. A. Ferrán, B. Pflugfelder, and P. Ferrara
We have recently described a method based on Artificial Neural Networks to cluster protein sequences into families. The network was trained with Kohonen’s unsupervised-learning algorithm using, as inputs, matrix patterns derived from the bipeptide composition of the proteins. We show here the application of that method to classify 1758 protein sequences, using as inputs a limited number of principal components of the bipeptidic matrices. As a result of training, the network selforganized the activation of its neurons into a topologically ordered map, in which proteins belonging to a known family (immunoglobulins, actins, interferons, myosins, HLA histocompatibility antigens, hemoglobins, etc.) were usually associated with the same neuron or with neighboring ones. Once the topological map has been obtained, the classification of new sequences is very fast.