Ivo L. Hofacker, Martijn A. Huynen, Peter F. Stadler, Paul E. Stolorz
The prediction of RNA secondary structure on the basis of sequence information is an important tool in biosequence analysis. However, it has typically been restricted to molecules containing no more than 4000 nucleotides due to the computational complexity of the underlying dynamic programming algorithm used. We describe here an approach to RNA sequence analysis based upon scalable computers, which enables molecules containing up to 20,000 nucleotides to be analysed. We apply the approach to investigation of the entire HIV genome, illustrating the power of these methods to perform knowledge discovery by identification of important secondary structure motifs within RNA sequence families.