Results 1 - 10
of
15
Haplotyping as Perfect Phylogeny: Conceptual Framework and Efficient Solutions (Extended Abstract)
, 2002
"... The next high-priority phase of human genomics will involve the development of a full Haplotype Map of the human genome [12]. It will be used in large-scale screens of populations to associate specific haplotypes with specific complex genetic-influenced diseases. A prototype Haplotype Mapping strat ..."
Abstract
-
Cited by 95 (10 self)
- Add to MetaCart
The next high-priority phase of human genomics will involve the development of a full Haplotype Map of the human genome [12]. It will be used in large-scale screens of populations to associate specific haplotypes with specific complex genetic-influenced diseases. A prototype Haplotype Mapping strategy is presently being finalized by an NIH workinggroup. The biological key to that strategy is the surprising fact that genomic DNA can be partitioned into long blocks where genetic recombination has been rare, leading to strikingly fewer distinct haplotypes in the population than previously expected [12, 6, 21, 7]. In this paper
Efficient reconstruction of haplotype structure via perfect phylogeny
- Journal of Bioinformatics and Computational Biology
, 2003
"... Each person’s genome contains two copies of each chromosome, one inherited from the father and the other from the mother. A person’s genotype specifies the pair of bases at each site, but does not specify which base occurs on which chromosome. The sequence of each chromosome separately is called a h ..."
Abstract
-
Cited by 56 (10 self)
- Add to MetaCart
Each person’s genome contains two copies of each chromosome, one inherited from the father and the other from the mother. A person’s genotype specifies the pair of bases at each site, but does not specify which base occurs on which chromosome. The sequence of each chromosome separately is called a haplotype. The determination of the haplotypes within a population is essential for understanding genetic variation and the inheritance of complex diseases. The haplotype mapping project, a successor to the human genome project, seeks to determine the common haplotypes in the human population. Since experimental determination of a person’s genotype is less expensive than determining its component haplotypes, algorithms are required for computing haplotypes from genotypes. Two observations aid in this process: first, the human genome contains short blocks within which only a few different haplotypes occur; second, as suggested by Gusfield, it is reasonable to assume that the haplotypes observed within a block have evolved according to a perfect phylogeny, in which at most one mutation event has occurred at any site, and no recombination occurred at the given region. We present a simple and efficient polynomial-time algorithm for inferring haplotypes from the genotypes of a set of individuals assuming a perfect phylogeny. Using a reduction to 2-SAT we extend this algorithm to handle constraints that apply when we have genotypes from both parents and child. We also present a hardness result for the problem of removing the minimum number of individuals from a population to ensure that the genotypes of the remaining individuals are consistent with a perfect phylogeny. Our algorithms have been tested on real data and give biologically meaningful results. Our webserver
Distance realization problems with applications to Internet tomography
"... In recent years, a variety of graph optimization problems have arisen in which the graphs involved are much too large for the usual algorithms to be effective. In these cases, even though we are not able to examine the entire graph (which may be changing dynamically), we would still like to deduce v ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
In recent years, a variety of graph optimization problems have arisen in which the graphs involved are much too large for the usual algorithms to be effective. In these cases, even though we are not able to examine the entire graph (which may be changing dynamically), we would still like to deduce various properties of it, such as the size of a connected component, the set of neighbors of a subset of vertices, etc. In this paper, we study a class of problems, called distance realization problems, which arise in the study of Internet data traffic models. uppose we are given a set S of terminal nodes, taken from some (unknown) weighted graph. A basic problem is to reconstruct a weighted graph G including S with possibly additional vertices, that realizes the given distance matrix for S. We will first show that this problem is not only difficult but the solution is often unstable in the sense that even if all distances between nodes in S decrease, the solution can increase by a factor proport...
A decomposition theory for binary linear codes,” submitted to
- IEEE Trans. Inform. Theory
, 2006
"... ABSTRACT. The decomposition theory of matroids initiated by Paul Seymour in the 1980’s has had an enormous impact on research in matroid theory. This theory, when applied to matrices over the binary field, yields a powerful decomposition theory for binary linear codes. In this paper, we give an over ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
ABSTRACT. The decomposition theory of matroids initiated by Paul Seymour in the 1980’s has had an enormous impact on research in matroid theory. This theory, when applied to matrices over the binary field, yields a powerful decomposition theory for binary linear codes. In this paper, we give an overview of this code decomposition theory, and discuss some of its implications in the context of the recently discovered formulation of maximum-likelihood (ML) decoding of a binary linear code over a discrete memoryless channel as a linear programming problem. We translate matroid-theoretic results of Grötschel and Truemper from the combinatorial optimization literature to give examples of non-trivial families of codes for which the ML decoding problem can be solved in time polynomial in the length of the code. One such family is that consisting of codes C for which the codeword polytope is identical to the Koetter-Vontobel fundamental polytope derived from the entire dual code C ⊥. However, we also show that such families of codes are not good in a coding-theoretic sense — either their dimension or their minimum distance must grow sub-linearly with codelength. As a consequence, we have that decoding by linear programming, when applied to good codes, cannot avoid failing occasionally due to the presence of pseudocodewords. 1.
Exact algorithms and applications for Tree-like Weighted Set Cover
- JOURNAL OF DISCRETE ALGORITHMS
, 2006
"... We introduce an NP-complete special case of the Weighted Set Cover problem and show its fixed-parameter tractability with respect to the maximum subset size, a parameter that appears to be small in relevant applications. More precisely, in this practically relevant variant we require that the given ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
We introduce an NP-complete special case of the Weighted Set Cover problem and show its fixed-parameter tractability with respect to the maximum subset size, a parameter that appears to be small in relevant applications. More precisely, in this practically relevant variant we require that the given collection C of subsets of a some base set S should be “tree-like.” That is, the subsets in C can be organized in a tree T such that every subset one-to-one corresponds to a tree node and, for each element s of S, the nodes corresponding to the subsets containing s induce a subtree of T. This is equivalent to the problem of finding a minimum edge cover in an edge-weighted acyclic hypergraph. Our main result is an algorithm running in O(3 k ·mn) time where k denotes the maximum subset size, n: = |S|, and m: = |C|. The algorithm also implies a fixed-parameter tractability result for the NP-complete Multicut in Trees problem, complementing previous approximation results. Our results find applications in computational biology in phylogenomics and for saving memory in tree decomposition based graph algorithms.
What is a matroid?
, 2007
"... Matroids were introduced by Whitney in 1935 to try to capture abstractly the essence of dependence. Whitney’s definition embraces a surprising diversity of combinatorial structures. Moreover, matroids arise naturally in combinatorial optimization since they are precisely the structures for which th ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Matroids were introduced by Whitney in 1935 to try to capture abstractly the essence of dependence. Whitney’s definition embraces a surprising diversity of combinatorial structures. Moreover, matroids arise naturally in combinatorial optimization since they are precisely the structures for which the greedy algorithm works. This survey paper introduces matroid theory, presents some of the main theorems in the subject, and identifies some of the major problems of current research interest.
Matchings, Matroids and Unimodular Matrices
, 1995
"... We focus on combinatorial problems arising from symmetric and skew-symmetric matrices. For much of the thesis we consider properties concerning the principal submatrices. In particular, we are interested in the property that every nonsingular principal submatrix is unimodular; matrices having this p ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
We focus on combinatorial problems arising from symmetric and skew-symmetric matrices. For much of the thesis we consider properties concerning the principal submatrices. In particular, we are interested in the property that every nonsingular principal submatrix is unimodular; matrices having this property are called principally unimodular. Principal unimodularity is a generalization of total unimodularity, and we generalize key polyhedral and matroidal results on total unimodularity. Highlights include a generalization of Hoffman and Kruskal's result on integral polyhedra, a generalization of Tutte's results on regular matroids, and partial results toward a decomposition theorem. Quite separate from the study of principal unimodularity we consider a particular skew-symmetric matrix of indeterminates associated with a graph. This matrix, called the Tutte matrix, was introduced by Tutte to study matchings. By considering the rank of an arbitrary submatrix of the Tutte matrix we disco...
INTEGER PROGRAMMING MODELS FOR GROUND-HOLDING IN AIR TRAFFIC FLOW MANAGEMENT
"... In this dissertation, integer programming models are applied to combinatorial problems in air traffic flow management. For the two problems studied, models are developed and analyzed both theoretically and computationally. This dissertation makes contributions to integer programming while providing ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
In this dissertation, integer programming models are applied to combinatorial problems in air traffic flow management. For the two problems studied, models are developed and analyzed both theoretically and computationally. This dissertation makes contributions to integer programming while providing efficient tools for solving air traffic flow management problems. Currently, a constrained arrival capacity situation at an airport in the United States is alleviated by holding inbound aircraft at their departure gates. The ground holding problem (GH) decides which aircraft to hold on the ground and for how long. This dissertation examines the GH from two perspectives. First, the hubbing operations of the airlines are considered by adding side constraints to GH. These constraints enforce the desire of the airlines to temporally groupbanks of flights. Five basic models and several variations of the ground holding problem with banking constraints (GHB) are presented. A particularly strong, facet-inducing model of the banking constraints is presented which allows one to
Multicommodity Flows and Approximation Algorithms
, 1994
"... This thesis is about multicommodity flows and their use in designing approximation algorithms for problems involving cuts in graphs. In a ground-breaking work Leighton and Rao [34] showed an approximate max-flow min-cut theorem for uniform multicommodity flow and used this to obtain an approximation ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
This thesis is about multicommodity flows and their use in designing approximation algorithms for problems involving cuts in graphs. In a ground-breaking work Leighton and Rao [34] showed an approximate max-flow min-cut theorem for uniform multicommodity flow and used this to obtain an approximation algorithm for the flux of a graph. We consider the multicommodity flow problem in which the object is to maximize the sum of the flows routed and prove the following approximate max-flow min-multicut theorem min-multicut O(log k) max-flow min-multicut where k is the number of commodities. Our proof is based on a rounding technique from [34]. Further, we show that this theorem is tight. For a multicommodity flow instance with specified demands, the ratio of the maximum concurrent flow to the sparsest cut was shown to be bounded by O(log 2 k) [30, 57, 17, 47]. We use ideas from our proof of the approximate max-flow min-multicut theorem and a geometric scaling technique from [1] to provi...
Independence and port oracles for matroids, with an application to computational learning theory
- Combinatorica
, 1996
"... Given a matroid M with distinguished element e, a port oracle with respect to e reports whether or not a given subset contains a circuit that contains e. The first main result of this paper is an algorithm for computing an e-based ear decomposition (that is, an ear decomposition every circuit of whi ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Given a matroid M with distinguished element e, a port oracle with respect to e reports whether or not a given subset contains a circuit that contains e. The first main result of this paper is an algorithm for computing an e-based ear decomposition (that is, an ear decomposition every circuit of which contains element e) of a matroid using only a polynomial number of elementary operations and port oracle calls. In the case that M is binary, the incidence vectors of the circuits in the ear decomposition form a matrix representation for M. Thus, this algorithm solves a problem in computational learning theory; it learns the class of binary matroid port (BMP) functions with membership queries in polynomial time. In this context, the algorithm generalizes results of Angluin, Hellerstein, and Karpinski [1], and Raghavan and Schach [17], who showed that certain subclasses of the BMP functions are learnable in polynomial time using membership queries. The second main result of this paper is an algorithm for testing independence of a given input set of the matroid M. This algorithm, which uses the ear decomposition algorithm as a subroutine, uses only a polynomial number of elementary operations and port oracle calls. The algorithm proves a constructive version of an early theorem of Lehman [13], which states that the port of a connected matroid uniquely determines the matroid.

