Results 1  10
of
64
A hybrid micromacroevolutionary approach to gene tree reconstruction
 J. Comput. Biol
, 2006
"... Gene family evolution is determined by microevolutionary processes (e.g., point mutations) and macroevolutionary processes (e.g., gene duplication and loss), yet macroevolutionary considerations are rarely incorporated into gene phylogeny reconstruction methods. We present a dynamic program to fin ..."
Abstract

Cited by 64 (4 self)
 Add to MetaCart
(Show Context)
Gene family evolution is determined by microevolutionary processes (e.g., point mutations) and macroevolutionary processes (e.g., gene duplication and loss), yet macroevolutionary considerations are rarely incorporated into gene phylogeny reconstruction methods. We present a dynamic program to find the most parsimonious gene family tree with respect to a macroevolutionary optimization criterion, the weighted sum of the number of gene duplications and losses. The existence of a polynomial delay algorithm for duplication/loss phylogeny reconstruction stands in contrast to most formulations of phylogeny reconstruction, which are NPcomplete. We next extend this result to obtain a twophase method for gene tree reconstruction that takes both micro and macroevolution into account. In the first phase, a gene tree is constructed from sequence data, using any of the previously known algorithms for gene phylogeny construction. In the second phase, the tree is refined by rearranging regions of the tree that do not have strong support in the sequence data to minimize the duplication/lost cost. Components of the tree with strong support are left intact. This hybrid approach incorporates both micro and macroevolutionary considerations, yet its computational requirements are modest in practice because the two phase approach constrains the search space. Our hybrid algorithm can
A short proof that phylogenetic tree reconstruction by maximum likelihood is hard
 IEEE Trans Comput Biol and Bioinformatics
"... Maximum likelihood is one of the most widely used techniques to infer evolutionary histories. Although it is thought to be intractable, a proof of its hardness has been lacking. Here, we give a short proof that computing the maximum likelihood tree is NPhard by exploiting a connection between likel ..."
Abstract

Cited by 48 (7 self)
 Add to MetaCart
(Show Context)
Maximum likelihood is one of the most widely used techniques to infer evolutionary histories. Although it is thought to be intractable, a proof of its hardness has been lacking. Here, we give a short proof that computing the maximum likelihood tree is NPhard by exploiting a connection between likelihood and parsimony observed by Tuffley and Steel. 1
Learning Nonsingular Phylogenies and Hidden Markov Models
 Proceedings of the thirtyseventh annual ACM Symposium on Theory of computing, Baltimore (STOC05
, 2005
"... In this paper, we study the problem of learning phylogenies and hidden Markov models. We call the Markov model nonsingular if all transtion matrices have determinants bounded away from 0 (and 1). We highlight the role of the nonsingularity condition for the learning problem. Learning hidden Markov m ..."
Abstract

Cited by 45 (7 self)
 Add to MetaCart
In this paper, we study the problem of learning phylogenies and hidden Markov models. We call the Markov model nonsingular if all transtion matrices have determinants bounded away from 0 (and 1). We highlight the role of the nonsingularity condition for the learning problem. Learning hidden Markov models without the nonsingularity condition is at least as hard as learning parity with noise. On the other hand, we give a polynomialtime algorithm for learning nonsingular phylogenies and hidden Markov models.
Phylogenetic models of rate heterogeneity: A high performance computing perspective
 In Proceedings of the 20th Internationational Parallel and Distributed Processing Symposium (IPDPS
, 2006
"... Inference of phylogenetic trees using the maximum likelihood (ML) method is NPhard. Furthermore, the computation of the likelihood function for huge trees of more than 1,000 organisms is computationally intensive due to a large amount of floating point operations and high memory consumption. Within ..."
Abstract

Cited by 43 (9 self)
 Add to MetaCart
(Show Context)
Inference of phylogenetic trees using the maximum likelihood (ML) method is NPhard. Furthermore, the computation of the likelihood function for huge trees of more than 1,000 organisms is computationally intensive due to a large amount of floating point operations and high memory consumption. Within this context, the present paper compares two competing mathematical models that account for evolutionary rate heterogeneity: the Γ and CAT models. The intention of this paper is to show that—from a purely empirical point of view—CAT can be used instead of Γ. The main advantage of CAT over Γ consists in significantly lower memory consumption and faster inference times. An experimental study using RAxML has been performed on 19 realworld datasets comprising 73 up to 1,663 DNA sequences. Results show that CAT is on average 5.5 times faster than Γ and—surprisingly enough—also yields trees with slightly superior Γ likelihood values. The usage of the CAT model decreases the amount of average L2 and L3 cache misses by factor 8.55. 1.
Maximum likelihood of phylogenetic networks
 Bioinformatics
"... Motivation:Horizontal gene transfer (HGT) is believed tobeubiquitous amongbacteria,andplaysamajor role in their genomediversificationas well as their ability to develop resistance to antibiotics. In light of its evolutionary significanceand implications for humanhealth, developing accurate and effic ..."
Abstract

Cited by 33 (10 self)
 Add to MetaCart
Motivation:Horizontal gene transfer (HGT) is believed tobeubiquitous amongbacteria,andplaysamajor role in their genomediversificationas well as their ability to develop resistance to antibiotics. In light of its evolutionary significanceand implications for humanhealth, developing accurate and efficient methods for detecting and reconstructing HGT is imperative. Results: In this article we provide a new HGToriented likelihood framework for many problems that involve phylogenybased HGT detection and reconstruction. Beside the formulation of various likelihood criteria, we show that most of these problems are NPhard, and offer heuristics for efficient and accurate reconstruction of HGT under these criteria. We implemented our heuristics and used them to analyze biological as well as synthetic data. In both cases, our criteria and heuristics exhibited very good performance with respect to identifying the correct number of HGT events as well as inferring their correct location on the species tree. Availability: Implementation of the criteria as well as heuristics and hardnessproofs are available from theauthors upon request.Hardness proofs can also be downloaded at
Improving the efficiency of SPR moves in phylogenetic tree search methods based on maximum likelihood
 BIOINFORMATICS
, 2005
"... Motivation: Maximum likelihood methods have become very popular for constructing phylogenetic trees from sequence data. However, despite noticeable recent progress, with large and difficult data sets (e.g. multiple genes with conflicting signals) current ML programs still require huge computing time ..."
Abstract

Cited by 31 (9 self)
 Add to MetaCart
(Show Context)
Motivation: Maximum likelihood methods have become very popular for constructing phylogenetic trees from sequence data. However, despite noticeable recent progress, with large and difficult data sets (e.g. multiple genes with conflicting signals) current ML programs still require huge computing times and can become trapped in bad local optima of the likelihood function. When this occurs, the resulting trees may still show some of the defects (e.g. long branch attraction) of starting trees obtained using fast distance or parsimony programs. Methods: Subtree Pruning and Regrafting (SPR) topological rearrangements are usually sufficient to intensively search the tree space. Here, we propose two new methods to make SPR moves more efficient. The first method uses a fast distancebased approach to detect the least promising candidate SPR moves, which are then simply discarded. The second method locally estimates the change in likelihood for any remaining potential SPRs, as opposed to globally evaluating the entire tree for each possible move. These two methods are implemented in a new algorithm with a sophisticated filtering strategy, which efficiently selects potential SPRs and concentrates most of the likelihood computation on the promising moves. Results: Experiments with real data sets comprising 35 to 250 taxa show that, while indeed greatly reducing the amount of computation, our approach provides likelihood values at least as good as those of the best known ML methods so far, and is very robust to poor starting trees. Furthermore, combining our new SPR algorithm with local moves such as PHYML’s nearest neighbor interchanges, the time needed to find good solutions can sometimes be reduced even more. Availability: Executables of our SPR program and the used data sets are available for download at
Dynamic multigrain parallelization on the cell broadband engine. Pages 90–100
 in Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
, 2007
"... This paper addresses the problem of orchestrating and scheduling parallelism at multiple levels of granularity on heterogeneous multicore processors. We present policies and mechanisms for adaptive exploitation and scheduling of multiple layers of parallelism on the Cell Broadband Engine. Our polici ..."
Abstract

Cited by 31 (10 self)
 Add to MetaCart
(Show Context)
This paper addresses the problem of orchestrating and scheduling parallelism at multiple levels of granularity on heterogeneous multicore processors. We present policies and mechanisms for adaptive exploitation and scheduling of multiple layers of parallelism on the Cell Broadband Engine. Our policies combine eventdriven task scheduling with malleable looplevel parallelism, which is exposed from the runtime system whenever tasklevel parallelism leaves cores idle. We present a runtime system for scheduling applications with layered parallelism on Cell and investigate its potential with RAxML, a computational biology application which infers large phylogenetic trees, using the Maximum Likelihood (ML) method. Our experiments show that the Cell benefits significantly from dynamic parallelization methods, that selectively exploit the layers of parallelism in the system, in response to workload characteristics. Our runtime environment outperforms naive parallelization and scheduling based on MPI and Linux by up to a factor of 2.6. We are able to execute RAxML on one Cell four times faster than on a dualprocessor system with Hyperthreaded Xeon processors, and 5–10 % faster than on a singleprocessor system with a dualcore, quadthread IBM Power5 processor. 1
The Average Common Substring Approach to Phylogenomic Reconstruction
, 2005
"... We describe a novel method for efficient reconstruction of phylogenetic trees, based on sequences of whole genomes or proteomes, whose lengths may greatly vary. The core of our method is a new measure of pairwise distances between sequences. This measure is based on computing the average lengths of ..."
Abstract

Cited by 28 (0 self)
 Add to MetaCart
We describe a novel method for efficient reconstruction of phylogenetic trees, based on sequences of whole genomes or proteomes, whose lengths may greatly vary. The core of our method is a new measure of pairwise distances between sequences. This measure is based on computing the average lengths of maximum common substrings. It is intrinsically related to information theoretic tools (KullbackLeibler relative entropy). We present an algorithm for efficiently computing these distances. In principle, the distance of two ℓ long sequences can be calculated in O(ℓ) time. We implemented the algorithm, using suffix arrays. The implementation is fast enough to enable the construction of the proteome phylogenomic tree for hundreds of species, and the genome phylogenomic forest for almost two thousand viruses. An initial analysis of the results exhibits a remarkable agreement with “acceptable phylogenetic and taxonomic truth”. To assess our approach, it was compared to the traditional (single gene or protein based) maximum likelihood method. It was compared to implementations of a number of alternative approaches, including two that were previously published in the literature, and to the published results of a third approach. Comparing their outcome and running time to ours, using a “traditional ” trees and a standard tree comparison method, our algorithm improved upon the “competition ” by a substantial margin. The simplicity and speed of our method allows for a whole genome analysis with the greatest scope attempted so far. We describe here five different applications of the method, which not only show the validity of the method, but also suggest a number of novel phylogenetic insights.
Evolutionary Phylogenetic Networks: Models and Issues
"... Abstract Phylogenetic networks are special graphs that generalize phylogenetic trees to allow for modeling of nontreelike evolutionary histories. The ability to sequence multiple genetic markers from a set of organisms and the conflicting evolutionary signals that these markers provide in many case ..."
Abstract

Cited by 18 (5 self)
 Add to MetaCart
(Show Context)
Abstract Phylogenetic networks are special graphs that generalize phylogenetic trees to allow for modeling of nontreelike evolutionary histories. The ability to sequence multiple genetic markers from a set of organisms and the conflicting evolutionary signals that these markers provide in many cases, have propelled research and interest in phylogenetic networks to the forefront in computational phylogenetics. Nonetheless, the term ‘phylogenetic network ’ has been generically used to refer to a class of models whose core shared property is tree generalization. Several excellent surveys of the different flavors of phylogenetic networks and methods for their reconstruction have been written recently. However, unlike these surveys, this chapter focuses specifically on one type of phylogenetic networks, namely evolutionary phylogenetic networks, which explicitly model reticulate evolutionary events. Further, this chapter focuses less on surveying existing tools, and addresses in more detail issues that are central to the accurate reconstruction of phylogenetic networks. 1