John Biggs, Calton Pu, and Philip Bourne
The maintenance of software which uses a rapidly evolving data annotation scheme is time consuming and expensive. At the same time without current software the annotation scheme itself becomes limited and is less likely to be widely adopted. A solution to this problem has been developed for the macromolecular Crystallographic Information File (mmCIF) annotation scheme. The approach could generalized for a variety of annotation schemes used or proposed for molecular biology data. mmCIF provides a highly structured and complete annotation for describing NMR and X-ray crystallographic data and the resulting maeromolecular structures. This annotation is maintained in the mmCIF dictionary which on-rently contains over 3,200 terms. A major challenge is to maintain code for converting between mmCIF and Protein Data Bank (PDB) annotations while both continue to evolve. The solution has been to define a simple domain specific language (DSL) which added to the extensive annotation already found in the mmCIF dictionary. The DSL calls specific mapping modules for each category of data item in the mmCIF dictionary. Adding or changing the mapping between PDB and mmCIF items of data is slraighlforward since data categories (and hence mapping modules) correspond to elements of macromolecular structure familiar to the experimentalist. Each time a change is made to the macrornolecular annotation the appropriate change is made to the easily located and modifiable mapping modules. A code generator is then called which reads the mapping modules and creates a new executable for performing the data conversion. In this way code is easily kept current by individuals with limited programming skill, but who have an understanding of macromolecular structure and details of the annotation scheme. Most important, the conversion process becomes part of the global dictionary and is not open to a variety of interpretations by different research groups writing code based on dictionary contents. Details of the DSL and code generator are provided.