Efficient Haplotype Inference with Answer Set Programming

Esra Erdem, Ferhan Ture

Identifying maternal and paternal inheritance is essential to be able to find the set of genes responsible for a particular disease. However, due to technological limitations, we have access to genotype data (genetic makeup of an individual), and determining haplotypes (genetic makeup of the parents) experimentally is a costly and time consuming procedure. With these biological motivations, we study a computational problem, called Haplotype Inference by Pure Parsimony (HIPP), that asks for the minimal number of haplotypes that form a given set of genotypes. HIPP has been studied using integer linear programming, branch and bound algorithms, SAT-based algorithms, or pseudo-boolean optimization methods. We introduce a new approach to solving HIPP, using Answer Set Programming (ASP). According to our experiments with a large number of problem instances (some automatically generated and some real), our ASP-based approach solves the most number of problems compared with other approaches. Due to the expressivity of the knowledge representation language of ASP, our approach allows us to solve variations of HIPP, e.g., with additional domain specific information, such as patterns/parts of haplotypes observed for some gene family, or with some missing genotype information. In this sense, the ASP-based approach is more general than the existing approaches to haplotype inference.

Subjects: 11. Knowledge Representation; 1. Applications

Submitted: Apr 14, 2008

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.