Rebecca Parsons and Mark E. Johnson
Applying genetic algorithms to DNA sequence assembly is not a straightforward process. Significantly improved results in terms of performance, quality of results, and the scaling of applicability have been realized through non-standard and even counter-intuitive parameter settings. Specifically, the solution time for a 10kb data set was reduced by an order of magnitude, and a 20kb data set that was previously unsolved by the genetic algorithm was solved in a time that represents only a linear increase from the 10kb data set. Additionally, significant progress has been made on a 35kb data set representing real biological data. A single contig solution was found for a 752 fragment subset of the data set, and a 15 contig solution was found for the full data set. This paper discusses the new results, the modifications to the previous genetic algorithm used in this study, the experimental design process by which the new results were obtained, the questions raised by these results, and some preliminary attempts to explain these results.