Eric Harley, Anthony J. Bonner, and Nathan Goodman
This paper proposes a simplified approach to the assembly of large physical genome maps. The approach focuses on two key problems: (i) the integration of diverse forms of data from numerous sources, and (ii) the detection and removal of errors and anomalies in the data. The approach simplifies map assembly by dividing it into three phases --- overlap, linkage and ordering. In the first phase, all forms of overlap data are integrated into a simple abstract structure, called clusters, where each cluster is a set of mutually-overlapping DNA segments. This phase filters out many questionable overlaps in the mapping data. In the second phase, clusters are linked together into a weighted intersection graph. False links between widely separated regions of the genome show up as crooked, branching structures in the graph. Removing these false links produces graphs that are straight, reflecting the linear structure of chromosomes. From these straight graphs, the third phase constructs a physical map. Graph algorithms and graph visualization play key roles in implementing the approach. At present, the approach is at an early stage of development: it has been tested on real and simulated mapping data, and the results look promising. This paper describes the first two phases of the approach in detail, and reports on our progress to date.