S. S. Skiena and G. Sundaram
We present a new practical algorithm to resolve the experimental data of restriction site analysis, which is a common technique for mapping DNA. Specifically, we assert that multiple digests with a single restriction enzyme can provide sufficient information to identify the positions of the restriction sites with high probability. The motivation for the new approach comes from combinatorial results on the number of mutually homeometric sets in one dimension, where two sets of n points homeometric if the multiset of (;) distances are they determine are the same. Since experimental data contains error, we propose algorithms for reconstructing sets from noisy interpoint distances, including the possibility of missing fragments. We analyze the performance of these algorithms under a reasonable probability distribution, establishing a relative error limit of r = O(1/n2) beyond which our technique becomes infeasible. Through simulations, we establish that our technique is robust enough to reconstruct data with relative errors of up to 7.0% in the measured fragment lengths for typical problems, which appears sufficient for certain biological apphcations.