Results 1 
8 of
8
pplacer: linear time maximumlikelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree
 BMC bioinfo
"... Background: Likelihoodbased phylogenetic inference is generally considered to be the most reliable classification method for unknown sequences. However, traditional likelihoodbased phylogenetic methods cannot be applied to large volumes of short reads from nextgeneration sequencing due to computa ..."
Abstract

Cited by 33 (3 self)
 Add to MetaCart
(Show Context)
Background: Likelihoodbased phylogenetic inference is generally considered to be the most reliable classification method for unknown sequences. However, traditional likelihoodbased phylogenetic methods cannot be applied to large volumes of short reads from nextgeneration sequencing due to computational complexity issues and lack of phylogenetic signal. “Phylogenetic placement, ” where a reference tree is fixed and the unknown query sequences are placed onto the tree via a reference alignment, is a way to bring the inferential power offered by likelihoodbased approaches to large data sets. Results: This paper introduces pplacer, a software package for phylogenetic placement and subsequent visualization. The algorithm can place twenty thousand short reads on a reference tree of one thousand taxa per hour per processor, has essentially linear time and memory complexity in the number of reference taxa, and is easy to run in parallel. Pplacer features calculation of the posterior probability of a placement on an edge, which is a statistically rigorous way of quantifying uncertainty on an edgebyedge basis. It also can inform the user of the positional uncertainty for query sequences by calculating expected distance between placement locations, which is crucial in the estimation of uncertainty with a wellsampled reference tree. The software provides visualizations using branch thickness and color to represent number of placements and their uncertainty. A simulation study using reads generated from 631 COG alignments shows a high level of accuracy for phylogenetic placement over a wide range of alignment diversity, and the power of edge uncertainty estimates to measure placement confidence. 1 ar
The phylogenetic KantorovichRubinstein metric for environmental sequence samples. Arxiv preprint arXiv:1005.1699
, 2010
"... Abstract. Using modern technology, it is now common to survey microbial communities by sequencing DNA or RNA extracted in bulk from a given environment. Comparative methods are needed that indicate the extent to which two communities differ given data sets of this type. UniFrac, a method built aroun ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
(Show Context)
Abstract. Using modern technology, it is now common to survey microbial communities by sequencing DNA or RNA extracted in bulk from a given environment. Comparative methods are needed that indicate the extent to which two communities differ given data sets of this type. UniFrac, a method built around a somewhat ad hoc phylogeneticsbased distance between two communities, is one of the most commonly used tools for these analyses. We provide a foundation for such methods by establishing that if one equates a metagenomic sample with its empirical distribution on a reference phylogenetic tree, then the weighted UniFrac distance between two samples is just the classical KantorovichRubinstein (KR) distance between the corresponding empirical distributions. We demonstrate that this KR distance and extensions of it that arise from incorporating uncertainty in the location of sample points can be written as a readily computable integral over the tree, we develop Lp Zolotarevtype generalizations of the metric, and we show how the pvalue of the resulting natural permutation test of the null hypothesis “no difference between the two communities ” can be approximated using a functional of a Gaussian process indexed by the tree. We relate the L2 case to an ANOVAtype decomposition and find that the distribution of its associated Gaussian functional is that of a computable linear combination of independent χ2 1 random variables. 1.
Polyhedral geometry of phylogenetic rogue taxa
 BULLETIN OF MATHEMATICAL BIOLOGY
, 2010
"... It is well known among phylogeneticists that adding an extra taxon (e.g. species) to a data set can alter the structure of the optimal phylogenetic tree in surprising ways. However, little is known about this “rogue taxon” effect. In this paper we characterize the behavior of balanced minimum evol ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
It is well known among phylogeneticists that adding an extra taxon (e.g. species) to a data set can alter the structure of the optimal phylogenetic tree in surprising ways. However, little is known about this “rogue taxon” effect. In this paper we characterize the behavior of balanced minimum evolution (BME) phylogenetics on data sets of this type using tools from polyhedral geometry. First we show that for any distance matrix there exist distances to a “rogue taxon ” such that the BMEoptimal tree for the data set with the new taxon does not contain any nontrivial splits (bipartitions) of the optimal tree for the original data. Second, we prove a theorem which restricts the topology of BMEoptimal trees for data sets of this type, thus showing that a rogue taxon cannot have an arbitrary effect on the optimal tree. Third, we construct polyhedral cones computationally which give complete answers for BME rogue taxon behavior when our original data fits a tree on four, five, and six taxa. We use these cones to derive sufficient conditions for rogue taxon behavior for four taxa, and to understand the frequency of the rogue taxon effect via simulation.
S: Evolutionary placement of short sequence reads on multicore architectures
 Proceedings of AICCSA10, at 8th ACS/IEEE International Conference on Computer Systems and Applications (AICCSA10
, 2010
"... Abstract—The application of high performance computing methods in bioinformatics becomes increasingly important because of the masses of data generated by novel shortread DNA sequencers. One important application of such short reads, is the analysis of microbial communities where the anonymous sho ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
Abstract—The application of high performance computing methods in bioinformatics becomes increasingly important because of the masses of data generated by novel shortread DNA sequencers. One important application of such short reads, is the analysis of microbial communities where the anonymous short reads need to be identified by sequence comparison to a set of reference sequences. This identification is required to analyze the microbial composition and biological diversity of the sample. We briefly introduce a new algorithm for evolutionary (phylogenetic) placement of short reads under the Maximum Likelihood criterion and implement it in RAxML. While this algorithm is significantly more accurate than plain pairwise sequence comparison it can become highly computeintensive when a typical number of 100,000 reads and more need to be placed into an existing phylogenetic tree. Therefore, we deploy multigrain parallelism to improve parallel efficiency of this algorithm on 16core and 32core architectures. Via this multigrain approach, we achieve parallel execution time improvements of 25 % and superlinear speedups on 16 cores, as well as nearlinear speedups and improvements exceeding 50 % on 32cores on two large realworld microbial datasets. Evolutionary placement of 100,000 reads into a tree with more than 4,000 taxa now only requires less than 2 hours of execution time on 32 cores.
DOI:10.1093/sysbio/syu126 TaxonRich Phylogenomic Analyses Resolve the Eukaryotic Tree of Life and Reveal the Power of Subsampling by Sites
, 2014
"... Abstract.—Most eukaryotic lineages are microbial, and many have only recently been sampled for phylogenetic studies or remain in the “dark area ” of the tree of life where there are no molecular data. To assess relationships among eukaryotic lineages,we performa taxonrich phylogenomic analysis incl ..."
Abstract
 Add to MetaCart
Abstract.—Most eukaryotic lineages are microbial, and many have only recently been sampled for phylogenetic studies or remain in the “dark area ” of the tree of life where there are no molecular data. To assess relationships among eukaryotic lineages,we performa taxonrich phylogenomic analysis including 232 eukaryotes selected tomaximize taxonomic diversity and up to 1554 genes chosen as vertically inherited based on their broad distribution among eukaryotes. We also include sequences from 486 bacteria and 84 archaea to assess the impact of endosymbiotic gene transfer (EGT) from plastids and to detect contamination. Overall, our analyses are consistent with other less taxonrich estimates of the eukaryotic tree of life, and we recover strong support for five major clades: Amoebozoa, Excavata (without the genusMalawimonas), Opisthokonta, Archaeplastida, and SAR (Stramenopila, Alveolata, and Rhizaria). Our analyses also highlight the existence of “orphan” lineages, lineages that lack robust placement in the eukaryotic tree of life, and indicate the possibility of as yet undiscovered diversity. In analyses including bacteria and archaea, we find that approximately 10 % of the 1554 genes, which we choose because they are found in four or five of the five major eukaryotic clades and hence may be more likely to be inherited vertically, appear to have been acquired from cyanobacteria through EGT in photosynthetic lineages. Removing these EGT genes places the green algae as sister to the glaucophytes instead of the red algae, suggesting that unknowingly including
Title: TaxonRich Phylogenomic Analyses Resolve The Eukaryotic Tree Of Life And Reveal The Power Of Subsampling By Sites
"... niversity on Septem ..."
METHODOLOGY ARTICLE Open Access
"... pplacer: linear time maximumlikelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree ..."
Abstract
 Add to MetaCart
(Show Context)
pplacer: linear time maximumlikelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree