Pierre Baldi and Yves Chauvin
Hidden Markov Models (HMMs) are useful in number of tasks in computational molecular biology, and in particular to model and align protein families. We argue that ttMMs are somewhat optimal within a certain modeling hierarchy. Single first order HMMs, however, have two potential limitations: a large number of unstructured parameters, and a built-in inability to deal with long-range dependencies. Hybrid HMM/Neural Network (NN) architectures attempt to overcome these limitations. In hybrid HMM/NN, the HMM parameters are computed by a NN. This provides a reparametrization that allows for flexible control of model complexity, and incorporation of constraints. The approach is tested on the immunoglobulin family. A hybrid model is trained, and a multiple alignment derived, with less than a fourth of the number of parameters used with previous single HMMs. To capture dependencies, however, one must resort to a larger hybrid model class, where the data is modeled by multiple HMMs. The parameters of the HMMs, and their modulation as a function of input or context, is again calculated by a NN.