C. Lefèvre and J. -E Ikeda
A method for pattern analysis of DNA sequence data is considered. A space economical automaton for word recognition was presented elsewhere together with an algorithm for its compilation in linear time. An algorithm for the localization of words including imperfect matches (motif search) was developed. A program was implemented on the Macintosh and used extensively for the representation of the word composition of DNA data. We explore different sets of regulatory sequences to illustrate the performance of this method. In mammalian DNA, this analysis reveals "consensus motifs" corresponding to functional (or putative) cis-acting elements mediating the regulation of genc expression.