Results 11  20
of
21
Building the Tree of Life on Terascale Systems
 In Proc. of the 21st International Parallel and Distributed Processing Symposium
, 2007
"... Bayesian phylogenetic inference is an important alternative to maximum likelihoodbased phylogenetic method. However, inferring large trees using the Bayesian approach is computationally demanding—requiring huge amounts of memory and months of computational time. With a combination of novel parallel ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
Bayesian phylogenetic inference is an important alternative to maximum likelihoodbased phylogenetic method. However, inferring large trees using the Bayesian approach is computationally demanding—requiring huge amounts of memory and months of computational time. With a combination of novel parallel algorithms and latest system technology, terascale phylogenetic tools will provide biologists the computational power necessary to conduct experiments on very large dataset, and thus aid construction of the tree of life. In this work we evaluate the performance of PBPI, a parallel application that reconstructs phylogenetic trees using MCMCbased Bayesian methods, on two terascale systems, Blue Gene/L at IBM Rochester and System X at Virginia Tech. Our results confirm that for a benchmark dataset with 218 taxa and 10000 characters, PBPI can achieve linear speedup on 1024 or more processors for both systems. 1.
Breakpoint Medians and Breakpoint Phylogenies: A FixedParameter Approach
, 2002
"... With breakpoint distance, the genome rearrangement field delivered one of the currently most popular measures in phylogenetic studies for related species. Here, BREAK POINT MEDIAN, which is NPcomplete already for three given species (whose genomes are represented as signed orderings), is the core ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
With breakpoint distance, the genome rearrangement field delivered one of the currently most popular measures in phylogenetic studies for related species. Here, BREAK POINT MEDIAN, which is NPcomplete already for three given species (whose genomes are represented as signed orderings), is the core basic problem. For the important special case of three species, approximation (ratio 7/6) and exact heuristic algorithms were developed. Here, we provide an exact, fixedparameter algorithm with provable performance bounds. For instance, a breakpoint median for three signed orderings over n elements that causes at most d breakpoints can be computed in time O((2.15) n). We show the algorithm's practical usefulness through experimental studies. In particular, we demonstrate that a simple implementation of our algorithm combined with a new tree construction heuristic allows for a new approach to breakpoint phylogeny, yielding evolutionary trees that are competitive in comparison with known results developed in a recent series of papers that use clever algorithm engineering methods.
Computational Biology and HighPerformance Computing
"... Understanding evolution and the basic structure and function of proteins are two grand challenge problems in biology that can be solved only through the use of highperformance computing. Computational biology has been revolutionized by advances in both computer hardware and software algorithms. Exa ..."
Abstract
 Add to MetaCart
(Show Context)
Understanding evolution and the basic structure and function of proteins are two grand challenge problems in biology that can be solved only through the use of highperformance computing. Computational biology has been revolutionized by advances in both computer hardware and software algorithms. Examples include assembling the human genome and using geneexpression chips to determine which genes are active in a cell [11, 12]. Highthroughput techniques for DNA sequencing and analysis of gene expression have led to exponential growth in the amount of publicly available genomic data. For example, the genetic sequence information in the National Center for Biotechnology Information’s GenBank database has nearly doubled in size each year for the past decade, with more than 37 million sequence records as of August 2004. Biologists are keen to analyze and understand this data, since genetic sequences determine biological structure, and thus the function, of proteins. Understanding the function of biologically active molecules leads to understanding biochemical pathways and diseaseprevention strategies and cures, along with the mechanisms of life itself. Increased availability of genomic data is not incremental. The amount is now so great that traditional database approaches are no longer sufficient for rapidly performing life science queries involving the fusion of data types. Computing systems are now so powerful it is Simulation of blood vessel formation through aggregation of dispersed endothelial cells. The model employs only experimentally confirmed behaviors of individual cells.
POY version 4: phylogenetic analysis using dynamic homologies
, 2009
"... We present POY version 4, an open source program for the phylogenetic analysis of morphological, prealigned sequence, unaligned sequence, and genomic data. POY allows phylogenetic inference when not only substitutions, but insertions, deletions, and rearrangement events are allowed (computed using t ..."
Abstract
 Add to MetaCart
(Show Context)
We present POY version 4, an open source program for the phylogenetic analysis of morphological, prealigned sequence, unaligned sequence, and genomic data. POY allows phylogenetic inference when not only substitutions, but insertions, deletions, and rearrangement events are allowed (computed using the breakpoint or inversion distance). Compared with previous versions, POY 4 provides greater flexibility, a larger number of supported parameter sets, numerous execution time improvements, a vastly improved user interface, greater quality control, and extensive documentation. We introduce POYs basic features, and present a simple example illustrating the performance improvements over previous versions of the application. The Willi Hennig Society 2009. POY is an open source, phylogenetic analysis program for molecular and morphological data. Version 3.0.11 was released in September 2004, and work on version 4.0 began in 2005. After more than a year of public beta testing which started early in 2007, versions 4.0 and 4.1 have now been released.
Designing Multithreaded Algorithms for BreadthFirst Search and stconnectivity on the Cray MTA2
"... Graph abstractions are extensively used to understand and solve challenging computational problems in various scientific and engineering domains. They have particularly gained prominence in recent years for applications involving largescale networks. In this paper, we present fast parallel implemen ..."
Abstract
 Add to MetaCart
(Show Context)
Graph abstractions are extensively used to understand and solve challenging computational problems in various scientific and engineering domains. They have particularly gained prominence in recent years for applications involving largescale networks. In this paper, we present fast parallel implementations of three fundamental graph theory problems, BreadthFirst Search, stconnectivity and shortest paths for unweighted graphs, on multithreaded architectures such as the Cray MTA2. The architectural features of the MTA2 aid the design of simple, scalable and highperformance graph algorithms. We test our implementations on large scalefree and sparse random graph instances, and report impressive results, both for algorithm execution time and parallel performance. For instance, BreadthFirst Search on a scalefree graph of 400 million vertices and 2 billion edges takes less than 5 seconds on a 40processor MTA2 system with an absolute speedup of close to 30. This is a significant result in parallel computing, as prior implementations of parallel graph algorithms report very limited or no speedup on irregular and sparse graphs, when compared to the best sequential implementation. 1
Executive Director of HighPerformance Computing
"... Parallel algorithms and applications Highperformance computing for computational biology and genomics Largescale phylogeny reconstruction HPC Thrust Leader of CIPRES IPDPS’07 Tutorial HPC Methods for Computational Genomics 3Ananth Kalyanaraman Assistant Professor, ..."
Abstract
 Add to MetaCart
Parallel algorithms and applications Highperformance computing for computational biology and genomics Largescale phylogeny reconstruction HPC Thrust Leader of CIPRES IPDPS’07 Tutorial HPC Methods for Computational Genomics 3Ananth Kalyanaraman Assistant Professor,
me M€u
, 2003
"... ; revis effective (Morgan, 1997). The power of all approaches, heuristic search strategy to find shortest trees. With the search per branch and the time invested. If the search parameters used are too lax, the BS values will probably purely cladistic methods, instead featuring maximum PAUP*s data an ..."
Abstract
 Add to MetaCart
(Show Context)
; revis effective (Morgan, 1997). The power of all approaches, heuristic search strategy to find shortest trees. With the search per branch and the time invested. If the search parameters used are too lax, the BS values will probably purely cladistic methods, instead featuring maximum PAUP*s data and tree file format (NEXUS; Maddison 1998), no program appears to be at hand that attempts to implement fast algorithms into the search strategies during heuristic searches under constraints. In a study employing branch support analysis with help of Autovoluti*common search strategies and the currently available processor speed, this ability shrinks rapidly as matrix sizes grow to more than 100–150 taxa (although strongly depending on the data set). While finding the most parsimonious (MP) tree for a large data set is time consuming itself, heuristic searches have to be repeated N times if N is the number of internal branches to be tested (N 6 number of terminals—2). Thus, a compromise has to be found between elaborateness of the et al., 1997) is understood or shared by a variety of other programs (Huelsenbeck and Ronquist, 2001; Maddison and Maddison, 1992; M€uller and M€uller, 2003a). However, as of this writing, neither the calculation of BS nor the parsimony ratchet or other fast algorithms are available in PAUP*. While for the older Mac OS systems (up to OS9), software exists that simplifies application of the reverse constraint method for small to intermediate data sets (e.g., AutoDecay, Eriksson,however, strongly depends on the ability of the applied likelihood methods, and numerous statistical tests.Short Com
Computational Biology and HighPerformance Computing
"... Understanding evolution and the basic structure and function of proteins are two grand challenge problems in biology that can be solved only through the use of highperformance computing. Computational biology has been revolutionized by advances in both computer hardware and software algorithms. Exa ..."
Abstract
 Add to MetaCart
(Show Context)
Understanding evolution and the basic structure and function of proteins are two grand challenge problems in biology that can be solved only through the use of highperformance computing. Computational biology has been revolutionized by advances in both computer hardware and software algorithms. Examples include assembling the human genome and using geneexpression chips to determine which genes are active in a cell [11, 12]. Highthroughput techniques for DNA sequencing and analysis of gene expression have led to exponential growth in the amount of publicly available genomic data. For example, the genetic sequence information in the National Center for Biotechnology Information’s GenBank database has nearly doubled in size each year for the past decade, with more than 37 million sequence records as of August 2004. Biologists are keen to analyze and understand this data, since genetic sequences determine biological structure, and thus the function, of proteins. Understanding the function of biologically active molecules leads to understanding biochemical pathways and diseaseprevention strategies and cures, along with the mechanisms of life itself. Increased availability of genomic data is not incremental. The amount is now so great that traditional database approaches are no longer sufficient for rapidly performing life science queries involving the fusion of data types. Computing systems are now so powerful it is Simulation of blood vessel formation through aggregation of dispersed endothelial cells. The model employs only experimentally confirmed behaviors of individual cells.
unknown title
"... Bayesian phylogenetic inference is an important alternative to maximum likelihoodbased phylogenetic method. However, inferring large trees using the Bayesian approach is computationally demanding—requiring huge amounts of memory and months of computational time. With a combination of novel parallel ..."
Abstract
 Add to MetaCart
(Show Context)
Bayesian phylogenetic inference is an important alternative to maximum likelihoodbased phylogenetic method. However, inferring large trees using the Bayesian approach is computationally demanding—requiring huge amounts of memory and months of computational time. With a combination of novel parallel algorithms and latest system technology, terascale phylogenetic tools will provide biologist the computational power necessary to conduct experiments on very large dataset, and thus aid construction of the tree of life. In this work we evaluate the performance of PBPI, a parallel application that reconstructs phylogenetic trees using MCMCbased Bayesian methods, on two terascale systems, Blue Gene/L at IBM Rochester and System X at Virginia Tech. Our results confirm that for a benchmark dataset with 218 taxa and 10000 characters, PBPI can achieve linear speedup on 1024 or more processors for both systems. 1.