Anders Gorm Pedersen and Henrik Nielsen
Translation in eukaryotes does not always start at the first AUG in an mRNA, implying that context information also plays a role. This makes prediction of translation initiation sites a non-trivial task, especially when analysing EST and genome data where the entire mature mRNA sequence is not known. In this paper, we employ artificial neural networks to predict which AUG triplet in an mRNA sequence is the start codon. The trained networks correctly classified 88 % of Arabidopsis and 85 % of vertebrate AUG triplets. We find that our trained neural networks use a combination of local start codon context and global sequence information. Furthermore, analysis of false predictions shows that AUGs in frame with the actual start codon are more frequently selected than out-of-frame AUGs, suggesting that our networks use reading frame detection. A number of con icts between neural network predictions and database annotations are analysed in detail, leading to identification of possible database errors.