Michael J. E. Sternberg, Hedvig Hegyi, Suhail A. Islam, Jingchu Luo, and Robert B, Russell
The automatic identification of protein domains from coordinates is the first step in the classification of protein folds and hence is required for databases to guide structure prediction. Most algorithms encode a single concept based and sometimes do not yield assignments that are consistent with the generally accepted perception. Our development of an automatic approach to identify reliably domains from protein coordinates is described. The algorithm is benchmarked against a manual identification of the domains in 284 representative protein chains. The first step is the domain assignment by distance (DAD) algorithm that considers the density of inter-residue contacts represented in a contact matrix. The algorithm yields 85% agreement with the manual assignment. The paper then considers how the reliability of these assignments could be evaluated. Finally the use of structural comparisons using the STAMP algorithm to validate domain assignment is reported on a test case.