Saurabh Sinha and Martin Tompa, University of Washington
Understanding the mechanisms that determine the regulation of gene expression is an important and challenging problem. A fundamental subproblem is to identify DNA-binding sites for unknown regulatory factors, given a collection of genes believed to be coregulated, and given the noncoding DNA sequences near those genes. We present anenumerative statistical method for identifying good candidates for such transcription factor binding sites. Unlike local search techniques such as Expectation Maximization and Gibbs samplers that may not reach a global optimum, the method proposed here is guaranteed to produce the motifs with greatest z -scores. We discuss the results of experiments in which this algorithm was used to locate candidate binding sites in several well studied pathways of S. cerevisiae , as well as gene clusters from some of the hybridization microarray experiments.