Results 1  10
of
18
On the Approximability of Numerical Taxonomy (Fitting Distances by Tree Metrics)
, 1995
"... We consider the problem of fitting an n \Theta n distance matrix D by a tree metric T . Let " be the distance to the closest tree metric, that is, " = min T fk T; D k1 g. First we present an O(n 2 ) algorithm for finding an additive tree T such that k T; D k1 3", giving the first algorithm for th ..."
Abstract

Cited by 81 (4 self)
 Add to MetaCart
We consider the problem of fitting an n \Theta n distance matrix D by a tree metric T . Let " be the distance to the closest tree metric, that is, " = min T fk T; D k1 g. First we present an O(n 2 ) algorithm for finding an additive tree T such that k T; D k1 3", giving the first algorithm for this problem with a performance guarantee. Second we show that it is NPhard to find a tree T such that k T; D k1 ! 9 8 ". DIMACS, Rutgers University, Piscataway, NJ 08855, USA. (agarwala@dimacs.rutgers.edu) Supported by Special Year National Science Foundation grant BIR9412594. y (bafna@dimacs.rutgers.edu) Supported by Special Year National Science Foundation grant BIR9412594. z Department of Computer Science, Rutgers University, Piscataway, NJ 08855, USA. (farach@cs.rutgers.edu, http://www.cs.rutgers.edu/¸farach) Supported by NSF Career Development Award CCR9501942. x (bon@dimacs.rutgers.edu) Supported by a DIMACS postdoctoral fellowship under grants STC8809648 and 9119999. ...
The Parameterized Complexity of Sequence Alignment and Consensus
, 1994
"... The Longest common subsequence problem is examined from the point of view of parameterized computational complexity. There are several different ways in which parameters enter the problem, such as the number of sequences to be analyzed, the length of the common subsequence, and the size of the alpha ..."
Abstract

Cited by 36 (13 self)
 Add to MetaCart
The Longest common subsequence problem is examined from the point of view of parameterized computational complexity. There are several different ways in which parameters enter the problem, such as the number of sequences to be analyzed, the length of the common subsequence, and the size of the alphabet. Lower bounds on the complexity of this basic problem imply lower bounds on a number of other sequence alignment and consensus problems. At issue in the theory of parameterized complexity is whether a problem which takes input (x; k) can be solved in time f(k) \Delta n ff where ff is independent of k (termed fixedparameter tractability). It can be argued that this is the appropriate asymptotic model of feasible computability for problems for which a small range of parameter values covers important applications  a situation which certainly holds for many problems in biological sequence analysis. Our main results show that: (1) The Longest Common Subsequence (LCS) parameterized by t...
A short proof that phylogenetic tree reconstruction by maximum likelihood is hard
 IEEE Trans Comput Biol and Bioinformatics
"... Maximum likelihood is one of the most widely used techniques to infer evolutionary histories. Although it is thought to be intractable, a proof of its hardness has been lacking. Here, we give a short proof that computing the maximum likelihood tree is NPhard by exploiting a connection between likel ..."
Abstract

Cited by 23 (3 self)
 Add to MetaCart
Maximum likelihood is one of the most widely used techniques to infer evolutionary histories. Although it is thought to be intractable, a proof of its hardness has been lacking. Here, we give a short proof that computing the maximum likelihood tree is NPhard by exploiting a connection between likelihood and parsimony observed by Tuffley and Steel. 1
Fitting tree metrics: Hierarchical clustering and phylogeny
 In Proceedings of the Symposium on Foundations of Computer Science
, 2005
"... Given dissimilarity data on pairs of objects in a set, we study the problem of fitting a tree metric to this data so as to minimize additive error (i.e. some measure of the difference between the tree metric and the given data). This problem arises in constructing an Mlevel hierarchical clustering ..."
Abstract

Cited by 18 (3 self)
 Add to MetaCart
Given dissimilarity data on pairs of objects in a set, we study the problem of fitting a tree metric to this data so as to minimize additive error (i.e. some measure of the difference between the tree metric and the given data). This problem arises in constructing an Mlevel hierarchical clustering of objects (or an ultrametric on objects) so as to match the given dissimilarity data – a basic problem in statistics. Viewed in this way, the problem is a generalization of the correlation clustering problem (which corresponds to M = 1). We give a very simple randomized combinatorial algorithm for the Mlevel hierarchical clustering problem that achieves an approximation ratio of M +2. This is a generalization of a previous factor 3 algorithm for correlation clustering on complete graphs. The problem of fitting tree metrics also arises in phylogeny where the objective is to learn the evolution tree by fitting a tree to dissimilarity data on taxa. The quality of the fit is measured by taking the ℓp norm of the difference between the tree metric constructed and the given data. Previous results obtained a factor 3 approximation for finding the closest tree tree metric under the ℓ ∞ norm. No nontrivial approximation for general ℓp norms was known before. We present a novel LP formulation for this problem and obtain an O((log n log log n) 1/p) approximation using this. En route, we obtain an O((log n log log n) 1/p) approximation for the closest ultrametric under the ℓp norm. Our techniques are based on representing and viewing an ultrametric as a hierarchy of clusterings, and may be useful in other contexts. ∗ Partially supported by a Charlotte Elizabeth Procter Fellowship. Part of this work was done while visiting Microsoft Research.
On Using Oracles That Compute Values
 In Proc. 10th Annual Symp. on Theoret. Aspects of Computer Science, Lecture Notes in Computer Science
, 1993
"... This paper focuses on complexity classes of partial functions that are computed in polynomial time with oracles in NPMV, the class of all multivalued partial functions that are computable nondeterministically in polynomial time. Concerning deterministic polynomialtime reducibilities, it is shown th ..."
Abstract

Cited by 17 (6 self)
 Add to MetaCart
This paper focuses on complexity classes of partial functions that are computed in polynomial time with oracles in NPMV, the class of all multivalued partial functions that are computable nondeterministically in polynomial time. Concerning deterministic polynomialtime reducibilities, it is shown that 1. A multivalued partial function is polynomialtime computable with k adaptive queries to NPMV if and only if it is polynomialtime computable via 2 k \Gamma 1 nonadaptive queries to NPMV. 2. A characteristic function is polynomialtime computable with k adaptive queries to NPMV if and only if it is polynomialtime computable with k adaptive queries to NP. 3. Unless the Boolean hierarchy collapses, for every k, k adaptive (nonadaptive) queries to NPMV is different than k + 1 adaptive (nonadaptive) queries to NPMV. Nondeterministic reducibilities, lowness and the difference hierarchy over NPMV are also studied. The difference hierarchy for partial functions does not collapse unless the...
ANCESTRAL MAXIMUM LIKELIHOOD OF EVOLUTIONARY TREES IS HARD
, 2004
"... Maximum likelihood (ML) (Neyman, 1971) is an increasingly popular optimality criterion for selecting evolutionary trees. Finding optimal ML trees appears to be a very hard computational task — in particular, algorithms and heuristics for ML take longer to run than algorithms and heuristics for maxim ..."
Abstract

Cited by 15 (3 self)
 Add to MetaCart
Maximum likelihood (ML) (Neyman, 1971) is an increasingly popular optimality criterion for selecting evolutionary trees. Finding optimal ML trees appears to be a very hard computational task — in particular, algorithms and heuristics for ML take longer to run than algorithms and heuristics for maximum parsimony (MP). However, while MP has been known to be NPcomplete for over 20 years, no such hardness result has been obtained so far for ML. In this work we make a first step in this direction by proving that ancestral maximum likelihood (AML) is NPcomplete. The input to this problem is a set of aligned sequences of equal length and the goal is to find a tree and an assignment of ancestral sequences for all of that tree’s internal vertices such that the likelihood of generating both the ancestral and contemporary sequences is maximized. Our NPhardness proof follows that for MP given in (Day, Johnson and Sankoff, 1986) in that we use the same reduction from Vertex Cover; however, the proof of correctness for this reduction relative to AML is different and substantially more involved.
Sparse Dynamic Programming For EvolutionaryTree Comparison
 SIAM Journal on Computing
, 1997
"... . Constructing evolutionary trees for species sets is a fundamental problem in biology. Unfortunately, there is no single agreed upon method for this task, and many methods are in use. Current practice dictates that trees be constructed using di#erent methods and that the resulting trees should be c ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
. Constructing evolutionary trees for species sets is a fundamental problem in biology. Unfortunately, there is no single agreed upon method for this task, and many methods are in use. Current practice dictates that trees be constructed using di#erent methods and that the resulting trees should be compared for consensus. It has become necessary to automate this process as the number of species under consideration has grown. We study one formalization of the problem: the maximum agreementsubtree (MAST) problem. The MAST problem is as follows: given a set A and two rooted trees T 0 and T 1 leaflabeled by the elements of A, find a maximumcardinality subset B of A such that the topological restrictions of T 0 and T 1 to B are isomorphic. In this paper, we will show that this problem reduces to unary weighted bipartite matching (UWBM) with an O(n 1+o(1) ) additive overhead. We also show that UWBM reduces linearly to MAST. Thus our algorithm is optimal unless UWBM can be solv...
Oracles That Compute Values
, 1997
"... . This paper focuses on complexity classes of partial functions that are computed in polynomial time with oracles in NPMV, the class of all multivalued partial functions that are computable nondeterministically in polynomial time. Concerning deterministic polynomialtime reducibilities, it is shown ..."
Abstract

Cited by 11 (4 self)
 Add to MetaCart
. This paper focuses on complexity classes of partial functions that are computed in polynomial time with oracles in NPMV, the class of all multivalued partial functions that are computable nondeterministically in polynomial time. Concerning deterministic polynomialtime reducibilities, it is shown that 1. A multivalued partial function is polynomialtime computable with k adaptive queries to NPMV if and only if it is polynomialtime computable via 2 k \Gamma 1 nonadaptive queries to NPMV. 2. A characteristic function is polynomialtime computable with k adaptive queries to NPMV if and only if it is polynomialtime computable with k adaptive queries to NP. 3. Unless the Boolean hierarchy collapses, for every k, k adaptive (nonadaptive) queries to NPMV is different than k + 1 adaptive (nonadaptive) queries to NPMV. Nondeterministic reducibilities, lowness and the difference hierarchy over NPMV are also studied. The difference hierarchy for partial functions does not collapse unless the Boolean hierarchy collapses, but, surprisingly, the levels of the difference and bounded query hierarchies do not interleave (as is the case for sets) unless the polynomial hierarchy collapses. Key words. computational complexity, complexity classes, relativized computation, bounded query classes, Boolean hierarchy, multivalued functions, NPMV AMS subject classifications. 68Q05, 68Q10, 68Q15, 03D10, 03D15 1.
Parameterized Complexity Analysis in Computational Biology
 Comput. Appl. Biosci
, 1995
"... Many computational problems in biology involve parameters for which a small range of values cover important applications. We argue that for many problems in this setting, parameterized computational complexity rather than NPcompleteness is the appropriate tool for studying apparent intractability. ..."
Abstract

Cited by 9 (4 self)
 Add to MetaCart
Many computational problems in biology involve parameters for which a small range of values cover important applications. We argue that for many problems in this setting, parameterized computational complexity rather than NPcompleteness is the appropriate tool for studying apparent intractability. At issue in the theory of parameterized complexity is whether a problem can be solved in time O(n ff ) for each fixed parameter value, where ff is a constant independent of the parameter. In addition to surveying this complexity framework, we describe a new result for the Longest common subsequence problem. In particular, we show that the problem is hard for W [t] for all t when parameterized by the number of strings and the size of the alphabet. Lower bounds on the complexity of this basic combinatorial problem imply lower bounds on more general sequence alignment and consensus discovery problems. We also describe a number of open problems pertaining to the parameterized complexity of pro...
Numerical Taxonomy on Data: Experimental Results
 IN PROCEEDINGS OF THE EIGHTH ANNUAL ACMSIAM SYMPOSIUM ON DISCRETE ALGORITHMS
, 1997
"... We consider the problem of fitting an n 2n distance matrix D by a tree metric T . This problem is NPhard for most reasonable distance functions between D and T . Recently, an approximation algorithm was presented [1] which achieves a factor of 3 approximation to the L∞ best fitting tree. We call t ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
We consider the problem of fitting an n 2n distance matrix D by a tree metric T . This problem is NPhard for most reasonable distance functions between D and T . Recently, an approximation algorithm was presented [1] which achieves a factor of 3 approximation to the L∞ best fitting tree. We call this method the Single Pivot (SP) heuristic Within the biology community, the socalled NeighborJoining (NJ) heuristic [14] has wide acceptance. In this paper, we introduced a new Double Pivot (DP) heuristic, which is an extension of the SP heuristic, and show that DP outperforms NJ on biological and random data.