John R. Koza and David Andre
The goal of automatic programming is to create, in an automated way, a computer program that enables a computer to solve a problem. Ideally, an automatic programming system should require that the user pre-specify as little as possible about the problem environment. In particular, it is desirable that the user not be required to prespecify the architecture of the ultimate solution to his problem. The question of how to automatically create the architecture of the overall program in an evolutionary approach to automatic programming, such as genetic programming, has a parallel in the biological world: how new structures and behaviors are created in living things. This corresponds to the question of how new DNA that encodes for a new protein is created in more complex organisms. This chapter describes how the biological theory of gene duplication described in Susumu Ohno’s provocative book, Evolution by Means of Gene Duplication, was brought to bear on the problem of architecture discovery in genetic programming. The resulting biologicallymotivated approach uses six new architecture-altering operations to enable genetic programming to automatically discover the architecture of the solution at the same time as genetic programming is evolving a solution to the problem. Genetic programming with the architecture-altering operations is used to evolve a computer program to classify a given protein segment as being a transmembrane domain or non-transmembrane area of the protein (without biochemical knowledge, such as the hydrophobicity values used in human-written algorithms for this task). The best genetically-evolved program achieved an out-of-sample error rate that was better than that reported for other previously reported human-written algorithms. This is an instance of an automated machine learning algorithm matching human performance on a non-trivial problem.