Gary Benson and Lan Dong
One of the less well understood mutational transformations that act upon DNA is tandem duplication. In this process, a stretch of DNA is duplicated to produce two or more adjacent copies, resulting in a tandem repeat. Over time, the copies undergo additional mutations so that typically, multiple approximate tandem copies are present. An interesting feature of tandem repeats is that the duplicated copies are preserved together, making it possible to do "phylogenetic analysis" on a single sequence. This involves using the pattern of mutations among the copies to determine a minimal or a most likely history for the repeat. A history tries to describe the interwoven pattern of substitutions, indels, and duplication events in such away as to minimize the number of identical mutations that arise independently. Because the copies are adjacent and ordered, the history problem can not be solved by standard phylogeny algorithms. In this paper, we introduce several versions of the tandem repeat history problem, develop algorithmic solutions and evaluate their performance. We also develop ways to visualize important features of a history with the goal of discovering properties of the duplication mechanism.