John M. Heumann, Alan S. Lapedes, and Gary D. Stormo
We use a quantitative definition of specificity to develop a neural network for the identification of common protein binding sites in a collection of unaligned DNA fragments. We demonstrate the equivalence of the method to maximizing Information Content of the aligned sites when simple models of the binding energy and the genome are employed. The network method subsumes those simple models and is capable of working with more complicated ones. This is demonstrated using a Markov model of the E. coil genome and a sampling method to approximate the partition function. A variation of Gibbs’ sampling aids in avoiding local minima.