David B. Searls
Genetic information, as expressed in the four-letter code of the DNA of living organisms, represents a complex and richly expressive natural knowledge representation system, capturing procedural information that describes how to create and maintain life. The study of its semantics (i.e., the field of molecular biology) has yielded a wealth of information, but its syntax has been elaborated primarily at the lowest lexical levels, without benefit of formal computational approaches that might help to organize its description and analysis. This paper discusses such an approach, using generative grammars to express the information in DNA sequences in a declarative, hierarchical manner. A prototype implemented in a Prolog-based Definite Clause Grammar system is presented, which allows such declarative descriptions to be used directly for analysis of genetic information by parsing DNA. Examples are given of the utility of this method in the domain, and speed-ups and extensions are also proposed.