Results 1  10
of
43
Efficient haplotype inference with pseudoBoolean optimization
 In Algebraic Biology 2007
, 2007
"... Abstract. Haplotype inference from genotype data is a key computational problem in bioinformatics, since retrieving directly haplotype information from DNA samples is not feasible using existing technology. One of the methods for solving this problem uses the pure parsimony criterion, an approach kn ..."
Abstract

Cited by 17 (7 self)
 Add to MetaCart
(Show Context)
Abstract. Haplotype inference from genotype data is a key computational problem in bioinformatics, since retrieving directly haplotype information from DNA samples is not feasible using existing technology. One of the methods for solving this problem uses the pure parsimony criterion, an approach known as Haplotype Inference by Pure Parsimony (HIPP). Initial work in this area was based on a number of different Integer Linear Programming (ILP) models and branch and bound algorithms. Recent work has shown that the utilization of a Boolean Satisfiability (SAT) formulation and state of the art SAT solvers represents the most efficient approach for solving the HIPP problem. Motivated by the promising results obtained using SAT techniques, this paper investigates the utilization of modern PseudoBoolean Optimization (PBO) algorithms for solving the HIPP problem. The paper starts by applying PBO to existing ILP models. The results are promising, and motivate the development of a new PBO model (RPoly) for the HIPP problem, which has a compact representation and eliminates key symmetries. Experimental results indicate that RPoly outperforms the SATbased approach on most problem instances, being, in general, significantly more efficient.
Efficient Haplotype Inference with Answer Set Programming
"... Identifying maternal and paternal inheritance is essential to be able to find the set of genes responsible for a particular disease. However, due to technological limitations, we have access to genotype data (genetic makeup of an individual), and determining haplotypes (genetic makeup of the parents ..."
Abstract

Cited by 15 (2 self)
 Add to MetaCart
(Show Context)
Identifying maternal and paternal inheritance is essential to be able to find the set of genes responsible for a particular disease. However, due to technological limitations, we have access to genotype data (genetic makeup of an individual), and determining haplotypes (genetic makeup of the parents) experimentally is a costly and time consuming procedure. With these biological motivations, we study a computational problem, called Haplotype Inference by Pure Parsimony (HIPP), that asks for the minimal number of haplotypes that form a given set of genotypes. HIPP has been studied using integer linear programming, branch and bound algorithms, SATbased algorithms, or pseudoboolean optimization methods. We introduce a new approach to solving HIPP, using Answer Set Programming (ASP). According to our experiments with a large number of problem instances (some automatically generated and some real), our ASPbased approach solves the most number of problems compared with other approaches. Due to the expressivity of the knowledge representation language of ASP, our approach allows us to solve variations of HIPP, e.g., with additional domain specific information, such as patterns/parts of haplotypes observed for some gene family, or with some missing genotype information. In this sense, the ASPbased approach is more general than the existing approaches to haplotype inference.
Boosting Haplotype Inference with Local Search
"... Abstract. A very challenging problem in the genetics domain is to infer haplotypes from genotypes. This process is expected to identify genes affecting health, disease and response to drugs. One of the approaches to haplotype inference aims to minimise the number of different haplotypes used, and is ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
(Show Context)
Abstract. A very challenging problem in the genetics domain is to infer haplotypes from genotypes. This process is expected to identify genes affecting health, disease and response to drugs. One of the approaches to haplotype inference aims to minimise the number of different haplotypes used, and is known as haplotype inference by pure parsimony (HIPP). The HIPP problem is computationally difficult, being NPhard. Recently, a SATbased method (SHIPs) has been proposed to solve the HIPP problem. This method iteratively considers an increasing number of haplotypes, starting from an initial lower bound. Hence, one important aspect of SHIPs is the lower bounding procedure, which reduces the number of iterations of the basic algorithm, and also indirectly simplifies the resulting SAT model. This paper describes the use of local search to improve existing lower bounding procedures. The new lower bounding procedure is guaranteed to be as tight as the existing procedures. In practice the new procedure is in most cases considerably tighter, allowing significant improvement of performance on challenging problem instances. 1
SAT in Bioinformatics: Making the Case with Haplotype Inference
"... Abstract. Mutation in DNA is the principal cause for differences among human beings, and Single Nucleotide Polymorphisms (SNPs) are the most common mutations. Hence, a fundamental task is to complete a map of haplotypes (which identify SNPs) in the human population. Associated with this effort, a ke ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
(Show Context)
Abstract. Mutation in DNA is the principal cause for differences among human beings, and Single Nucleotide Polymorphisms (SNPs) are the most common mutations. Hence, a fundamental task is to complete a map of haplotypes (which identify SNPs) in the human population. Associated with this effort, a key computational problem is the inference of haplotype data from genotype data, since in practice genotype data rather than haplotype data is usually obtained. Recent work has shown that a SATbased approach is by far the most efficient solution to the problem of haplotype inference by pure parsimony (HIPP), being several orders of magnitude faster than existing integer linear programming and branch and bound solutions. This paper proposes a number of key optimizations to the the original SATbased model. The new version of the model can be orders of magnitude faster than the original SATbased HIPP model, particularly on biological test data. 1
Efficient Haplotype Inference with Combined CP and OR Techniques
"... Abstract. Haplotype inference has relevant biological applications, and represents a challenging computational problem. Among others, pure parsimony provides a viable modeling approach for haplotype inference and provides a simple optimization criterion. Alternative approaches have been proposed for ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
(Show Context)
Abstract. Haplotype inference has relevant biological applications, and represents a challenging computational problem. Among others, pure parsimony provides a viable modeling approach for haplotype inference and provides a simple optimization criterion. Alternative approaches have been proposed for haplotype inference by pure parsimony (HIPP), including branch and bound, integer programming and, more recently, propositional satisfiability and pseudoBoolean optimization (PBO). Among these, the currently best performing HIPP approach is based on PBO. This paper proposes a number of effective improvements to PBObased HIPP, including the use of lower bounding and pruning techniques effective with other approaches. The new PBObased HIPP approach reduces by 50 % the number of instances that remain unsolvable by HIPP based approaches. 1
Stochastic local search for largescale instances of the Haplotype Inference Problem by Parsimony Abstract
"... Haplotype Inference is a challenging problem in bioinformatics that consists in inferring the basic genetic constitution of diploid organisms on the basis of their genotype. This information allows researchers to perform association studies for the genetic variants involved in diseases and the indiv ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
(Show Context)
Haplotype Inference is a challenging problem in bioinformatics that consists in inferring the basic genetic constitution of diploid organisms on the basis of their genotype. This information allows researchers to perform association studies for the genetic variants involved in diseases and the individual responses to therapeutic agents. A notable approach to the problem is to encode it as a combinatorial problem (under certain hypotheses, such as the pure parsimony criterion) and to solve it using offtheshelf combinatorial optimization techniques. The main methods applied to Haplotype Inference are either simple greedy heuristic or exact methods (Integer Linear Programming, Semidefinite Programming, SAT and pseudoboolean encoding) that, at present, are adequate only for moderate size instances. In this paper, we present and discuss an approach based on the combination of local search metaheuristics and a reduction procedure based on an analysis of the problem structure. Some relevant design issues are first described, then a family of local search metaheuristics is defined to tackle the Haplotype Inference. Results on common Haplotype Inference benchmarks show that the approach achieves a good tradeoff between solution quality and execution time.
Efficient and Tight Upper Bounds for Haplotype Inference by Pure Parsimony using Delayed Haplotype Selection
"... Abstract. Haplotype inference from genotype data is a key step towards a better understanding of the role played by genetic variations on inherited diseases. One of the most promising approaches uses the pure parsimony criterion. This approach is called Haplotype Inference by Pure Parsimony (HIPP) a ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
(Show Context)
Abstract. Haplotype inference from genotype data is a key step towards a better understanding of the role played by genetic variations on inherited diseases. One of the most promising approaches uses the pure parsimony criterion. This approach is called Haplotype Inference by Pure Parsimony (HIPP) and is NPhard as it aims at minimising the number of haplotypes required to explain a given set of genotypes. The HIPP problem is often solved using constraint satisfaction techniques, for which the upper bound on the number of required haplotypes is a key issue. Another very wellknown approach is Clark’s method, which resolves genotypes by greedily selecting an explaining pair of haplotypes. In this work, we combine the basic idea of Clark’s method with a more sophisticated method for the selection of explaining haplotypes, in order to explicitly introduce a bias towards parsimonious explanations. This new algorithm can be used either to obtain an approximated solution to the HIPP problem or to obtain an upper bound on the size of the pure parsimony solution. This upper bound can then used to efficiently encode the problem as a constraint satisfaction problem. The experimental evaluation, conducted using a large set of real and artificially generated examples, shows that the new method is much more effective than Clark’s method at obtaining parsimonious solutions, while keeping the advantages of simplicity and speed of Clark’s method. 1
Integer Programming Formulations and Computations solving Phylogenetic and Population Genetic Problems with Missing or Genotypic Data
"... Abstract. Several central and wellknown combinatorial problems in phylogenetics and population genetics have efficient, elegant solutions when the input is complete or consists of haplotype data, but lack efficient solutions when input is either incomplete, consists of genotype data, or is for prob ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
Abstract. Several central and wellknown combinatorial problems in phylogenetics and population genetics have efficient, elegant solutions when the input is complete or consists of haplotype data, but lack efficient solutions when input is either incomplete, consists of genotype data, or is for problems generalized from decision questions to optimization questions. Unfortunately, in biological applications, these harder problems arise very often. Previous research has shown that integerlinear programming can sometimes be used to solve hard problems in practice on a range of data that is realistic for current biological applications. Here, we describe a set of related integer linear programming (ILP) formulations for several additional problems, most of which are known to be NPhard. These ILP formulations address either the issue of missing data, or solve Haplotype Inference Problems with objective functions that model more complex biological phenomena than previous formulations. These ILP formulations solve efficiently on data whose composition reflects a range of data of current biological interest. We also assess the biological quality of the ILP solutions: some of the problems, although not all, solve with excellent quality. These results give a practical way to solve instances of some central, hard biological problems, and give practical ways to assess how well certain natural objective functions reflect complex biological phenomena. Perl code to generate the ILPs (for input to CPLEX) is on the web at wwwcsif.cs.ucdavis.edu/~gusfield 1
Gaspero. Twolevel ACO for haplotype inference under pure parsimony
 In Ant Colony Optimization and Swarm Intelligence, 6th International Workshop, ANTS 2008, volume 5217 of Lecture Notes in Computer Science
, 2008
"... Abstract. Haplotype Inference is a challenging problem in bioinformatics that consists in inferring the basic genetic constitution of diploid organisms on the basis of their genotype. This information enables researchers to perform association studies for the genetic variants involved in diseases an ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
(Show Context)
Abstract. Haplotype Inference is a challenging problem in bioinformatics that consists in inferring the basic genetic constitution of diploid organisms on the basis of their genotype. This information enables researchers to perform association studies for the genetic variants involved in diseases and the individual responses to therapeutic agents. A notable approach to the problem is to encode it as a combinatorial problem under certain hypotheses (such as the pure parsimony criterion) and to solve it using offtheshelf combinatorial optimization techniques. At present, the main methods applied to Haplotype Inference are either simple greedy heuristic or exact methods, which are adequate only for moderate size instances. In this paper, we present an iterative constructive approach to Haplotype Inference based on ACO and we compare it against a stateoftheart exact method. 1