Thomas W. Blackwell, Eric Rouchka, and David J. States
In the course of our efforts to build extended regions of human genomic sequence by assembling individual BAC sequences, we have encountered several instances where a region of the genome has been sequenced independently using reagents derived from two different individuals. Comparing these sequences allows us to analyze the frequency and distribution of single nucleotide polymorphisms (SNPs) in the human genome. The observed transition/transversion frequencies are consistent with a biological origin for the sequence discrepancies, and this suggests that the data produced by large sequencing centers are accurate enough to be used as the basis for SNP analysis. The observed distribution of single nucleotide polymorphisms in the human genome is not uniform.
An apparent duplication in the human genome extending over more than 130 kb between chromosomes 1p34 and 16p13 is reported. Independently derived sequences covering these regions are more than 99.9% identical, indicating that this duplication event must have occurred quite recently. FISH mapping results reported by the relevant laboratories indicate that the human population may be polymorphic for this duplication.
We present a population genetic theory for the expected distribution of SNPs and derive an algorithm for probabilistically segmenting genomic sequence into regions that are identical by descent (IBD) between two individuals based on this theory and the observed locations of polymorphisms. Based on these methods and a random mating model for the human population, estimates are made for the mutation rate in the human genome.