Results 1  10
of
38
Integer programming approaches to haplotype inference by pure parsimony
 IEEE/ACM Transactions on Computational Biology and Bioinformatics
, 2006
"... Abstract—In 2003, Gusfield introduced the Haplotype Inference by Pure Parsimony (HIPP) problem and presented an integer program (IP) that quickly solved many simulated instances of the problem [1]. Although it solved well on small instances, Gusfield’s IP can be of exponential size in the worst case ..."
Abstract

Cited by 43 (2 self)
 Add to MetaCart
(Show Context)
Abstract—In 2003, Gusfield introduced the Haplotype Inference by Pure Parsimony (HIPP) problem and presented an integer program (IP) that quickly solved many simulated instances of the problem [1]. Although it solved well on small instances, Gusfield’s IP can be of exponential size in the worst case. Several authors [2], [3] have presented polynomialsized IPs for the problem. In this paper, we further the work on IP approaches to HIPP. We extend the existing polynomialsized IPs by introducing several classes of valid cuts for the IP. We also present a new polynomialsized IP formulation that is a hybrid between two existing IP formulations and inherits many of the strengths of both. Many problems that are too complex for the exponentialsized formulations can still be solved in our new formulation in a reasonable amount of time. We provide a detailed empirical comparison of these IP formulations on both simulated and real genotype sequences. Our formulation can also be extended in a variety of ways to allow errors in the input or model the structure of the population under consideration. Index Terms—Computations on discrete structures, integer programming, biology and genetics, haplotype inference. 1
A lineartime algorithm for the perfect phylogeny haplotyping (PPH) problem
 In International Conference on Research in Computational Molecular Biology (RECOMB
, 2005
"... Since the introduction of the Perfect Phylogeny Haplotyping (PPH) Problem in RECOMB 2002 (Gusfield, 2002), the problem of finding a lineartime (deterministic, worstcase) solution for it has remained open, despite broad interest in the PPH problem and a series of papers on various aspects of it. In ..."
Abstract

Cited by 30 (7 self)
 Add to MetaCart
Since the introduction of the Perfect Phylogeny Haplotyping (PPH) Problem in RECOMB 2002 (Gusfield, 2002), the problem of finding a lineartime (deterministic, worstcase) solution for it has remained open, despite broad interest in the PPH problem and a series of papers on various aspects of it. In this paper, we solve the open problem, giving a practical, deterministic lineartime algorithm based on a simple data structure and simple operations on it. The method is straightforward to program and has been fully implemented. Simulations show that it is much faster in practice than prior nonlinear methods. The value of a lineartime solution to the PPH problem is partly conceptual and partly for use in the inner loop of algorithms for more complex problems, where the PPH problem must be solved repeatedly. Key words: Perfect Phylogeny Haplotyping (PPH) Problem, Haplotype Inference Problem, lineartime algorithm, shadow tree. 1.
Computing the Minimum Recombinant Haplotype Configuration from incomplete genotype data on a pedigree by integer linear programming
 Journal of Computational Biology
, 2005
"... We study the problem of reconstructing haplotype configurations from genotypes on pedigree data with missing alleles under the Mendelian law of inheritance and the minimum recombination principle, which is important for the construction of haplotype maps and genetic linkage/association analyses. Our ..."
Abstract

Cited by 26 (9 self)
 Add to MetaCart
(Show Context)
We study the problem of reconstructing haplotype configurations from genotypes on pedigree data with missing alleles under the Mendelian law of inheritance and the minimum recombination principle, which is important for the construction of haplotype maps and genetic linkage/association analyses. Our previous results show that the problem of finding a minimumrecombinant haplotype configuration (MRHC) is in general NPhard. The existing algorithms for MRHC either are heuristic in nature and cannot guarantee optimality, or only work under some restrictions (on e.g. the size and structure of the input pedigree, the number of marker loci, the number of recombinants in the pedigree, etc.). In addition, most of them cannot handle data with missing alleles and, for those that do consider missing data, they usually do not perform well in terms of minimizing the number of recombinants when a significant fraction of alleles are missing. This paper presents an effective integer linear programming (ILP) formulation of the MRHC problem with missing data and a branchandbound strategy that utilizes a partial order relationship and some other special relationships among variables to decide the branching order. The partial order relationship is discovered in the preprocessing of constraints by considering
Fast Elimination of Redundant Linear Equations and Reconstruction of RecombinationFree Mendelian Inheritance on a Pedigree
 Proc. of 18th Annual ACMSIAM Symoposium on Discrete Algorithms (SODA’07
, 2007
"... ..."
Efficient haplotype inference with pseudoBoolean optimization
 In Algebraic Biology 2007
, 2007
"... Abstract. Haplotype inference from genotype data is a key computational problem in bioinformatics, since retrieving directly haplotype information from DNA samples is not feasible using existing technology. One of the methods for solving this problem uses the pure parsimony criterion, an approach kn ..."
Abstract

Cited by 17 (7 self)
 Add to MetaCart
(Show Context)
Abstract. Haplotype inference from genotype data is a key computational problem in bioinformatics, since retrieving directly haplotype information from DNA samples is not feasible using existing technology. One of the methods for solving this problem uses the pure parsimony criterion, an approach known as Haplotype Inference by Pure Parsimony (HIPP). Initial work in this area was based on a number of different Integer Linear Programming (ILP) models and branch and bound algorithms. Recent work has shown that the utilization of a Boolean Satisfiability (SAT) formulation and state of the art SAT solvers represents the most efficient approach for solving the HIPP problem. Motivated by the promising results obtained using SAT techniques, this paper investigates the utilization of modern PseudoBoolean Optimization (PBO) algorithms for solving the HIPP problem. The paper starts by applying PBO to existing ILP models. The results are promising, and motivate the development of a new PBO model (RPoly) for the HIPP problem, which has a compact representation and eliminates key symmetries. Experimental results indicate that RPoly outperforms the SATbased approach on most problem instances, being, in general, significantly more efficient.
BNTagger: improved tagging SNP selection using Bayesian networks
 Bioinformatics
, 2006
"... Genetic variation analysis holds much promise as a basis for diseasegene association. However, due to the tremendous number of candidate single nucleotide polymorphisms (SNPs), there is a clear need to expedite genotyping by selecting and considering only a subset of all SNPs. This process is known ..."
Abstract

Cited by 13 (3 self)
 Add to MetaCart
Genetic variation analysis holds much promise as a basis for diseasegene association. However, due to the tremendous number of candidate single nucleotide polymorphisms (SNPs), there is a clear need to expedite genotyping by selecting and considering only a subset of all SNPs. This process is known as tagging SNP selection. Several methods for tagging SNP selection have been proposed, and have shown promising results. However, most of them rely on strong assumptions such as prior blockpartitioning, biallelic SNPs, or a fixed number or location of tagging SNPs. We introduce BNTagger, a new method for tagging SNP selection, based on conditional independence among SNPs. Using the formalism of Bayesian networks (BNs), our system aims to select a subset of independent and highly predictive SNPs. Similar to previous predictionbased methods, we aim to maximize the prediction accuracy of tagging SNPs, but unlike them, we neither fix the number nor the location of predictive tagging SNPs, nor require SNPs to be biallelic. In addition, for newlygenotyped samples, BNTagger directly uses genotype data as input, while producing as output haplotype data of all SNPs. Using three public data sets, we compare the prediction performance of our method to that of three stateoftheart tagging SNP selection methods. The results demonstrate that our method consistently improves upon previous methods in terms of prediction accuracy. Moreover, our method retains its good performance even when a very small number of tagging SNPs are used. Contact:
Islands of Tractability for Parsimony Haplotyping
 Proc. IEEE Computational Systems Bioinformatics Conf
, 2005
"... We study the parsimony approach to haplotype inference, which calls for finding a set of haplotypes of minimum cardinality that explains an input set of genotypes. We prove that the problem is APXhard even in very restricted cases. On the positive side, we identify islands of tractability for the p ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
(Show Context)
We study the parsimony approach to haplotype inference, which calls for finding a set of haplotypes of minimum cardinality that explains an input set of genotypes. We prove that the problem is APXhard even in very restricted cases. On the positive side, we identify islands of tractability for the problem, by focusing on instances with specific structure of haplotype sharing among the input genotypes. We exploit the structure of those instance to give polynomial and constantapproximation algorithms to the problem. We also show that the general parsimony haplotyping problem is fixed parameter tractable.
Boosting Haplotype Inference with Local Search
"... Abstract. A very challenging problem in the genetics domain is to infer haplotypes from genotypes. This process is expected to identify genes affecting health, disease and response to drugs. One of the approaches to haplotype inference aims to minimise the number of different haplotypes used, and is ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
(Show Context)
Abstract. A very challenging problem in the genetics domain is to infer haplotypes from genotypes. This process is expected to identify genes affecting health, disease and response to drugs. One of the approaches to haplotype inference aims to minimise the number of different haplotypes used, and is known as haplotype inference by pure parsimony (HIPP). The HIPP problem is computationally difficult, being NPhard. Recently, a SATbased method (SHIPs) has been proposed to solve the HIPP problem. This method iteratively considers an increasing number of haplotypes, starting from an initial lower bound. Hence, one important aspect of SHIPs is the lower bounding procedure, which reduces the number of iterations of the basic algorithm, and also indirectly simplifies the resulting SAT model. This paper describes the use of local search to improve existing lower bounding procedures. The new lower bounding procedure is guaranteed to be as tight as the existing procedures. In practice the new procedure is in most cases considerably tighter, allowing significant improvement of performance on challenging problem instances. 1
Minimum multicolored subgraph problem in multiplex pcr primer set selection and population haplotyping
 In Proceedings of the 6th International Conference on Computational Science (ICCS
, 2006
"... Abstract. In this paper we consider the minimum weight multicolored subgraph problem (MWMCSP), which is a common generalization of minimum cost multiplex PCR primer set selection and maximum likelihood population haplotyping. In this problem one is given an undirected graph G with nonnegative verte ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
(Show Context)
Abstract. In this paper we consider the minimum weight multicolored subgraph problem (MWMCSP), which is a common generalization of minimum cost multiplex PCR primer set selection and maximum likelihood population haplotyping. In this problem one is given an undirected graph G with nonnegative vertex weights and a color function that assigns to each edge one or more of n given colors, and the goal is to find a minimum weight set of vertices inducing edges of all n colors. We obtain improved approximation algorithms and hardness results for MWMCSP and its variant in which the goal is to find a minimum number of vertices inducing edges of at least k colors for a given integer k ≤ n. 1