Takayuki Yoshizumi, Teruhisa Miura, and Toru Ishida, Kyoto University
The multiple sequence alignment problem is one of the important problems in Genome Informatics. The notable feature of this problem is that its state-space forms a lattice. Researchers have applied search algorithms such as A* and memory-bounded search algorithms including SNC to this problem. Unfortunately, previous work could align only seven sequences at most. Korf proposed DCBDS, which exploits the features of a grid, and suggested that DCBDS probably solved this problem, effectively. We found, however, that DCBDS was not effective for aligning many sequences. In this paper, we propose a simple and effective search algorithm, A* with Partial Expansion, for state-spaces with large branching factors. The aim of this algorithm is to store only necessary nodes for finding an optimal solution. In node expansion, A* stores all child nodes, while our algorithm stores only promising child nodes. This mechanism enables us to reduce the memory requirements during a search. We apply our algorithm to the multiple sequence alignment problem. It can align seven sequences with only 4.7% of the stored nodes required by A*.