Jun Zhu, Jun Liu, and Charles Lawrence
Sequence alignment without the specification of gap penalties or a scoring matrix is attained by using Bayesian inference and a recursive algorithm. This procedure’s recursive algorithm sums over all possible alignments on the forward step to obtain normalizing constants essential to Bayesian inferences, and samples from the exact posterior distribution on the backward step. Since both terminal and intervening unrelated subsequences will often be excluded from an alignment, the resulting alignments may be seen as extensions of local alignments. An alignment’s significance is assessed using the Bayesian evidence. A shuffling simulation shows that Bayesian evidence against the null hypothesis tends to be a conservative measure of significance compared to classical p-values. An application to proteins from the GTPase superfamily shows that the posterior distribution of the number of gaps is often fiat and that the posterior dislribution of the evolutionary distance is often flat and sometimes bimodal. An alignment of 1GIA with 1ETU shows good correspondence with a structural alignment.