Results 1 - 10
of
18
On the Approximability of Numerical Taxonomy (Fitting Distances by Tree Metrics)
, 1995
"... We consider the problem of fitting an n \Theta n distance matrix D by a tree metric T . Let " be the distance to the closest tree metric, that is, " = min T fk T; D k1 g. First we present an O(n 2 ) algorithm for finding an additive tree T such that k T; D k1 3", giving the first algorithm for th ..."
Abstract
-
Cited by 74 (4 self)
- Add to MetaCart
We consider the problem of fitting an n \Theta n distance matrix D by a tree metric T . Let " be the distance to the closest tree metric, that is, " = min T fk T; D k1 g. First we present an O(n 2 ) algorithm for finding an additive tree T such that k T; D k1 3", giving the first algorithm for this problem with a performance guarantee. Second we show that it is NP-hard to find a tree T such that k T; D k1 ! 9 8 ". DIMACS, Rutgers University, Piscataway, NJ 08855, USA. (agarwala@dimacs.rutgers.edu) Supported by Special Year National Science Foundation grant BIR-9412594. y (bafna@dimacs.rutgers.edu) Supported by Special Year National Science Foundation grant BIR-9412594. z Department of Computer Science, Rutgers University, Piscataway, NJ 08855, USA. (farach@cs.rutgers.edu, http://www.cs.rutgers.edu/¸farach) Supported by NSF Career Development Award CCR-9501942. x (bon@dimacs.rutgers.edu) Supported by a DIMACS postdoctoral fellowship under grants STC-88-09648 and 9119999. ...
The Parameterized Complexity of Sequence Alignment and Consensus
, 1994
"... The Longest common subsequence problem is examined from the point of view of parameterized computational complexity. There are several different ways in which parameters enter the problem, such as the number of sequences to be analyzed, the length of the common subsequence, and the size of the alpha ..."
Abstract
-
Cited by 35 (13 self)
- Add to MetaCart
The Longest common subsequence problem is examined from the point of view of parameterized computational complexity. There are several different ways in which parameters enter the problem, such as the number of sequences to be analyzed, the length of the common subsequence, and the size of the alphabet. Lower bounds on the complexity of this basic problem imply lower bounds on a number of other sequence alignment and consensus problems. At issue in the theory of parameterized complexity is whether a problem which takes input (x; k) can be solved in time f(k) \Delta n ff where ff is independent of k (termed fixed-parameter tractability). It can be argued that this is the appropriate asymptotic model of feasible computability for problems for which a small range of parameter values covers important applications --- a situation which certainly holds for many problems in biological sequence analysis. Our main results show that: (1) The Longest Common Subsequence (LCS) parameterized by t...
On Using Oracles That Compute Values
- In Proc. 10th Annual Symp. on Theoret. Aspects of Computer Science, Lecture Notes in Computer Science
, 1993
"... This paper focuses on complexity classes of partial functions that are computed in polynomial time with oracles in NPMV, the class of all multivalued partial functions that are computable nondeterministically in polynomial time. Concerning deterministic polynomial-time reducibilities, it is shown th ..."
Abstract
-
Cited by 16 (6 self)
- Add to MetaCart
This paper focuses on complexity classes of partial functions that are computed in polynomial time with oracles in NPMV, the class of all multivalued partial functions that are computable nondeterministically in polynomial time. Concerning deterministic polynomial-time reducibilities, it is shown that 1. A multivalued partial function is polynomial-time computable with k adaptive queries to NPMV if and only if it is polynomial-time computable via 2 k \Gamma 1 nonadaptive queries to NPMV. 2. A characteristic function is polynomial-time computable with k adaptive queries to NPMV if and only if it is polynomial-time computable with k adaptive queries to NP. 3. Unless the Boolean hierarchy collapses, for every k, k adaptive (nonadaptive) queries to NPMV is different than k + 1 adaptive (nonadaptive) queries to NPMV. Nondeterministic reducibilities, lowness and the difference hierarchy over NPMV are also studied. The difference hierarchy for partial functions does not collapse unless the...
Fitting tree metrics: Hierarchical clustering and phylogeny
- In Proceedings of the Symposium on Foundations of Computer Science
, 2005
"... Given dissimilarity data on pairs of objects in a set, we study the problem of fitting a tree metric to this data so as to minimize additive error (i.e. some measure of the difference between the tree metric and the given data). This problem arises in constructing an M-level hierarchical clustering ..."
Abstract
-
Cited by 16 (3 self)
- Add to MetaCart
Given dissimilarity data on pairs of objects in a set, we study the problem of fitting a tree metric to this data so as to minimize additive error (i.e. some measure of the difference between the tree metric and the given data). This problem arises in constructing an M-level hierarchical clustering of objects (or an ultrametric on objects) so as to match the given dissimilarity data – a basic problem in statistics. Viewed in this way, the problem is a generalization of the correlation clustering problem (which corresponds to M = 1). We give a very simple randomized combinatorial algorithm for the Mlevel hierarchical clustering problem that achieves an approximation ratio of M +2. This is a generalization of a previous factor 3 algorithm for correlation clustering on complete graphs. The problem of fitting tree metrics also arises in phylogeny where the objective is to learn the evolution tree by fitting a tree to dissimilarity data on taxa. The quality of the fit is measured by taking the ℓp norm of the difference between the tree metric constructed and the given data. Previous results obtained a factor 3 approximation for finding the closest tree tree metric under the ℓ ∞ norm. No non-trivial approximation for general ℓp norms was known before. We present a novel LP formulation for this problem and obtain an O((log n log log n) 1/p) approximation using this. En route, we obtain an O((log n log log n) 1/p) approximation for the closest ultrametric under the ℓp norm. Our techniques are based on representing and viewing an ultrametric as a hierarchy of clusterings, and may be useful in other contexts. ∗ Partially supported by a Charlotte Elizabeth Procter Fellowship. Part of this work was done while visiting Microsoft Research.
ANCESTRAL MAXIMUM LIKELIHOOD OF EVOLUTIONARY TREES IS HARD
, 2004
"... Maximum likelihood (ML) (Neyman, 1971) is an increasingly popular optimality criterion for selecting evolutionary trees. Finding optimal ML trees appears to be a very hard computational task — in particular, algorithms and heuristics for ML take longer to run than algorithms and heuristics for maxim ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
Maximum likelihood (ML) (Neyman, 1971) is an increasingly popular optimality criterion for selecting evolutionary trees. Finding optimal ML trees appears to be a very hard computational task — in particular, algorithms and heuristics for ML take longer to run than algorithms and heuristics for maximum parsimony (MP). However, while MP has been known to be NP-complete for over 20 years, no such hardness result has been obtained so far for ML. In this work we make a first step in this direction by proving that ancestral maximum likelihood (AML) is NP-complete. The input to this problem is a set of aligned sequences of equal length and the goal is to find a tree and an assignment of ancestral sequences for all of that tree’s internal vertices such that the likelihood of generating both the ancestral and contemporary sequences is maximized. Our NP-hardness proof follows that for MP given in (Day, Johnson and Sankoff, 1986) in that we use the same reduction from Vertex Cover; however, the proof of correctness for this reduction relative to AML is different and substantially more involved.
A short proof that phylogenetic tree reconstruction by maximum likelihood is hard
- IEEE Trans Comput Biol and Bioinformatics
"... Maximum likelihood is one of the most widely used techniques to infer evolutionary histories. Although it is thought to be intractable, a proof of its hardness has been lacking. Here, we give a short proof that computing the maximum likelihood tree is NP-hard by exploiting a connection between likel ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
Maximum likelihood is one of the most widely used techniques to infer evolutionary histories. Although it is thought to be intractable, a proof of its hardness has been lacking. Here, we give a short proof that computing the maximum likelihood tree is NP-hard by exploiting a connection between likelihood and parsimony observed by Tuffley and Steel. 1
Oracles That Compute Values
, 1997
"... . This paper focuses on complexity classes of partial functions that are computed in polynomial time with oracles in NPMV, the class of all multivalued partial functions that are computable nondeterministically in polynomial time. Concerning deterministic polynomial-time reducibilities, it is shown ..."
Abstract
-
Cited by 11 (4 self)
- Add to MetaCart
. This paper focuses on complexity classes of partial functions that are computed in polynomial time with oracles in NPMV, the class of all multivalued partial functions that are computable nondeterministically in polynomial time. Concerning deterministic polynomial-time reducibilities, it is shown that 1. A multivalued partial function is polynomial-time computable with k adaptive queries to NPMV if and only if it is polynomial-time computable via 2 k \Gamma 1 nonadaptive queries to NPMV. 2. A characteristic function is polynomial-time computable with k adaptive queries to NPMV if and only if it is polynomial-time computable with k adaptive queries to NP. 3. Unless the Boolean hierarchy collapses, for every k, k adaptive (nonadaptive) queries to NPMV is different than k + 1 adaptive (nonadaptive) queries to NPMV. Nondeterministic reducibilities, lowness and the difference hierarchy over NPMV are also studied. The difference hierarchy for partial functions does not collapse unless the Boolean hierarchy collapses, but, surprisingly, the levels of the difference and bounded query hierarchies do not interleave (as is the case for sets) unless the polynomial hierarchy collapses. Key words. computational complexity, complexity classes, relativized computation, bounded query classes, Boolean hierarchy, multivalued functions, NPMV AMS subject classifications. 68Q05, 68Q10, 68Q15, 03D10, 03D15 1.
Sparse Dynamic Programming For Evolutionary-Tree Comparison
- SIAM Journal on Computing
, 1997
"... . Constructing evolutionary trees for species sets is a fundamental problem in biology. Unfortunately, there is no single agreed upon method for this task, and many methods are in use. Current practice dictates that trees be constructed using di#erent methods and that the resulting trees should be c ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
. Constructing evolutionary trees for species sets is a fundamental problem in biology. Unfortunately, there is no single agreed upon method for this task, and many methods are in use. Current practice dictates that trees be constructed using di#erent methods and that the resulting trees should be compared for consensus. It has become necessary to automate this process as the number of species under consideration has grown. We study one formalization of the problem: the maximum agreement-subtree (MAST) problem. The MAST problem is as follows: given a set A and two rooted trees T 0 and T 1 leaf-labeled by the elements of A, find a maximum-cardinality subset B of A such that the topological restrictions of T 0 and T 1 to B are isomorphic. In this paper, we will show that this problem reduces to unary weighted bipartite matching (UWBM) with an O(n 1+o(1) ) additive overhead. We also show that UWBM reduces linearly to MAST. Thus our algorithm is optimal unless UWBM can be solv...
Parameterized Complexity Analysis in Computational Biology
- Comput. Appl. Biosci
, 1995
"... Many computational problems in biology involve parameters for which a small range of values cover important applications. We argue that for many problems in this setting, parameterized computational complexity rather than NP-completeness is the appropriate tool for studying apparent intractability. ..."
Abstract
-
Cited by 9 (4 self)
- Add to MetaCart
Many computational problems in biology involve parameters for which a small range of values cover important applications. We argue that for many problems in this setting, parameterized computational complexity rather than NP-completeness is the appropriate tool for studying apparent intractability. At issue in the theory of parameterized complexity is whether a problem can be solved in time O(n ff ) for each fixed parameter value, where ff is a constant independent of the parameter. In addition to surveying this complexity framework, we describe a new result for the Longest common subsequence problem. In particular, we show that the problem is hard for W [t] for all t when parameterized by the number of strings and the size of the alphabet. Lower bounds on the complexity of this basic combinatorial problem imply lower bounds on more general sequence alignment and consensus discovery problems. We also describe a number of open problems pertaining to the parameterized complexity of pro...
Numerical Taxonomy on Data: Experimental Results
- IN PROCEEDINGS OF THE EIGHTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS
, 1997
"... We consider the problem of fitting an n 2n distance matrix D by a tree metric T . This problem is NP-hard for most reasonable distance functions between D and T . Recently, an approximation algorithm was presented [1] which achieves a factor of 3 approximation to the L∞ best fitting tree. We call t ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
We consider the problem of fitting an n 2n distance matrix D by a tree metric T . This problem is NP-hard for most reasonable distance functions between D and T . Recently, an approximation algorithm was presented [1] which achieves a factor of 3 approximation to the L∞ best fitting tree. We call this method the Single Pivot (SP) heuristic Within the biology community, the so-called NeighborJoining (NJ) heuristic [14] has wide acceptance. In this paper, we introduced a new Double Pivot (DP) heuristic, which is an extension of the SP heuristic, and show that DP outperforms NJ on biological and random data.

