Miguel A. Andrade
In this work I present an algorithm for deriving position-specic protein functional annotations. The input is based on the results of a sequence similarity search of a query sequence against a sequence database. Strings of words are extracted from the descriptions of the proteins, and the correlation between proteins having the same descriptors and amino acid conservation is used to compute a score that indicates which descriptor is likely to best describe the function of each particular residue. Analysis of the score curves and comparison of different functions allows an easy detection of parts of the sequence associated with different functions. Different levels of functional specificity can be compared, allowing the choice of the one that best suits the function of the protein. Immediate applications of this algorithm are, support for (automated) methods of protein functional annotation, and database coherency checking.