Pierre Baldi, Søren Brunak, Yves Chauvin, and Anders Gorm Pedersen
We study from a computational standpoint several different physical scales associated with structural features of DNA se-quences, including dinucleotide scales such as base stacking energy and propeller twist, and trinucleotide scales such as bendability and nucleosome positioning. We show that these scales provide an alternative or complementary compact rep-resentation of DNA sequences. As an example we construct a strand invariant representation of DNA sequences. The scales can also be used to analyze and discover new DNA structural patterns, especially in combinations with hidden Markov models (HMMs). The scales are applied to HMMs of human promoter sequences revealing a number of significant differences between regions upstream and downstream of the transcriptional start point. Finally we show, with some qualifications, that such scales are by and large independent, and therefore complement each other.