Thomas R. Ioerger, Thomas Holton, Jon A. Christopher, and James C. Sacchettini
X-ray crystallography is the most widely used method for determining the three-dimensional structures of proteins and other macromolecules. One of the most difficult steps in crystallography is interpreting the electron density map to build the final model. This is often done manually by crystallographers and is very time-consuming and error-prone. In this paper, we introduce a new automated system called TEXTAL for interpreting electron density maps using pattern recognition. Given a map to be modeled, TEXTAL divides the map into small regions and then finds regions with a similar pattern of density in a database of maps for proteins whose structures have already been solved. When a match is found, the coordinates of atoms in the region are inferred by analogy. The key to making the database lookup efficient is to extract numeric features that represent the patterns in each region and to compare feature values using a weighted Euclidean distance metric. It is crucial that the features be rotation-invariant, since regions with similar patterns of density can be oriented in any arbitrary way. This pattern-recognition approach can take advantage of data accumulated in large crystallographic databases to effectively learn the association between electron density and molecular structure by example.