Yin Xu, J. Ralph Einstein, Richard J. Mural, Manesh Shah, and Edward C. Uberbacher
A new version of the GRAIL system (Uberbacher and Mural, 1991; Mural et al., 1992; Uberbacher et al., 1993), called GRAIL II, has recently been developed (Xu et al., 1994). GRAIL II is a hybrid AI system that supports a number of DNA sequence analysis tools including protein-coding region recognition, PolyA site and transcription promoter recognition, gene model construction, translation to protein, and DNA/protein database searching capabilities. This paper presents the core of GRAIL II, the coding exon recognition and gene model construction algorithms. The exon recognition algorithm recognizes coding exons by combining coding feature analysis and edge signal (acceptor/donor/translation-start sites) detection. Unlike the original GRAIL system (Uberbacher and Mural, 1991; Mural et al., 1992), this algorithm uses vaxiable-length windows tailored to each potential exon candidate, making its performance almost exon lengthindependent. In this algorithm, the recognition process is divided into four steps, hdtially a large number of possible coding exon candidates are generated. Then a rule-based prescreening algorithm eliminates the majority of the improbable candidates. As the kernel of the recognition algorithm, three neural networks are trained to evaluate the remaining candidates. The outputs of the neural networks are then divided into clusters of candidates, corresponding to presumed exons. The algorithm makes its final prediction by picking the best candidate from each cluster. The gene construction algorithm (Xu, Mural and Uberbacher, 1994) uses a dynamic programming approach to build gene models by using as input the clusters predicted by the exon recognition algorithm. Extensive testing has been done on these two algorithms. The exou recognition algorithm is a significant improvement over the original GRAIL system, and the gene construction algorithm further improves the prediction results.