Efficient Haplotype Inference with Boolean Satisfiability

Ines Lynce, Joao Marques-Silva

One of the main topics of research in genomics is determining the relevance of mutations, described in haplotype data, as causes of some genetic diseases. However, due to technological limitations, genotype data rather than haplotype data is usually obtained. The haplotype inference by pure parsimony (HIPP)problem consists in inferring haplotypes from genotypes s.t.~the number of required haplotypes is minimum. Previous approaches to the HIPP problem have focused on integer programming models and branch-and-bound algorithms. In contrast, this paper proposes the utilization of Boolean Satisfiability (SAT). The proposed solution entails a SAT model, a number of key pruning techniques, and an iterative algorithm that enumerates the possible solution values for the target optimization problem. Experimental results, obtained on a wide range of instances, demonstrate that the SAT-based approach can be several orders of magnitude faster than existing solutions. Besides being more efficient, the SAT-based approach is also the only capable of computing the solution for a large number of instances.

Subjects: 15. Problem Solving; 1. Applications


This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.