Results 1 - 10
of
20
Scaling up accurate phylogenetic reconstruction from gene-order data
, 2002
"... Motivation: Phylogenetic reconstruction from gene-order data has attracted increasing attention from both biologists and computer scientists over the last few years. Methods used in reconstruction include distance-based methods (such as neighbor-joining), parsimony methods using sequence-based encod ..."
Abstract
-
Cited by 28 (13 self)
- Add to MetaCart
Motivation: Phylogenetic reconstruction from gene-order data has attracted increasing attention from both biologists and computer scientists over the last few years. Methods used in reconstruction include distance-based methods (such as neighbor-joining), parsimony methods using sequence-based encodings, Bayesian approaches, and direct optimization. The latter, pioneered by Sankoff and extended by us with the software suite GRAPPA, is the most accurate approach, but cannot handle more than about 15 genomes of limited size (e.g., organelles). Results: We report here on our successful efforts to scale up direct optimization through a two-step approach: the first step decomposes the dataset into smaller pieces and runs the direct optimization (GRAPPA) on the smaller pieces, while the second step builds a tree from the results obtained on the smaller pieces. We used the sophisticated disk-covering method (DCM) pioneered by Warnow and her group, suitably modified to take into account the computational limitations of GRAPPA. We find that DCM-GRAPPA scales gracefully to at least 1,000 genomes of a few hundred genes each and retains surprisingly high accuracy throughout the range: in our experiments, the topological error rate rarely exceeded a few percent. Thus, reconstruction based on gene-order data can now be accomplished with high accuracy on datasets of significant size. Availability: All of our software is available in source form under GPL at www.compbio.unm.edu Contact:
On the Similarity of Sets of Permutations and its Applications to Genome Comparison
, 2003
"... The comparison of genomes with the same gene content relies on our ability to compare permutations, either by measuring how much they di#er, or by measuring how much they are alike. With the notable exception of the breakpoint distance, which is based on the concept of conserved adjacencies, meas ..."
Abstract
-
Cited by 26 (6 self)
- Add to MetaCart
The comparison of genomes with the same gene content relies on our ability to compare permutations, either by measuring how much they di#er, or by measuring how much they are alike. With the notable exception of the breakpoint distance, which is based on the concept of conserved adjacencies, measures of distance do not generalize easily to sets of more than two permutations. In this paper, we present a basic unifying notion, conserved intervals, as a powerful generalization of adjacencies, and as a key feature of genome rearrangement theories. We also show that sets of conserved intervals have elegant nesting and chaining properties that allow the development of compact graphic representations, and linear time algorithms to manipulate them.
Genomic Distances under Deletions and Insertions
- THEORETICAL COMPUTER SCIENCE
, 2003
"... As more and more genomes are sequenced, evolutionary biologists are becoming increasingly interested in evolution at the level of whole genomes, in scenarios in which the genome evolves through insertions, deletions, and movements of genes along its chromosomes. In the mathematical model pioneere ..."
Abstract
-
Cited by 23 (6 self)
- Add to MetaCart
As more and more genomes are sequenced, evolutionary biologists are becoming increasingly interested in evolution at the level of whole genomes, in scenarios in which the genome evolves through insertions, deletions, and movements of genes along its chromosomes. In the mathematical model pioneered by Sankoff and others, a unichromosomal genome is represented by a signed permutation of a multi-set of genes; Hannenhalli and Pevzner showed that the edit distance between two signed permutations of the same set can be computed in polynomial time when all operations are inversions. El-Mabrouk extended that result to allow deletions and a limited form of insertions (which forbids duplications). In this paper we extend El-Mabrouk's work to handle duplications as well as insertions and present an alternate framework for computing (near) minimal edit sequences involving insertions, deletions, and inversions. We derive an error bound for our polynomial-time distance computation under various assumptions and present preliminary experimental results that suggest that performance in practice may be excellent, within a few percent of the actual distance.
Phylogenetic reconstruction from gene rearrangement data with unequal gene contents
- in Algorithms and Data Structures, 8th International Workshop, WADS 2003
, 2003
"... Abstract. Phylogenetic reconstruction from gene-rearrangement data has seen increased attention over the last five years. Existing methods are limited computationally and by the assumption (highly unrealistic in practice) that all genomes have the same gene content. We have recently shown that we ca ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
Abstract. Phylogenetic reconstruction from gene-rearrangement data has seen increased attention over the last five years. Existing methods are limited computationally and by the assumption (highly unrealistic in practice) that all genomes have the same gene content. We have recently shown that we can scale our reconstruction tool, GRAPPA, to instances with up to a thousand genomes with no loss of accuracy and at minimal computational cost. Computing genomic distances between two genomes with unequal gene contents has seen much progress recently, but that progress has not yet been reflected in phylogenetic reconstruction methods. In this paper, we present extensions to our GRAPPA approach that can handle limited numbers of duplications (one of the main requirements for analyzing genomic data from organelles) and a few deletions. Although GRAPPA is based on exhaustive search, we show that, in practice, our bounding functions suffice to prune away almost all of the search space (our pruning rates never fall below 99.995%), resulting in high accuracy and fast running times. The range of values within which we have tested our approach encompasses mitochondria and chloroplast organellar genomes, whose phylogenetic analysis is providing new insights on evolution. Keywords computational biology, phylogenetic reconstruction, gene-order data, whole-genome data, signed
A Lower Bound for the Breakpoint Phylogeny Problem
, 2004
"... Breakpoint phylogenies methods have been shown to be an effective tool for extracting phylogenetic information from gene order data. Currently, the only practical breakpoint phylogeny algorithms for the analysis of large genomes with varied gene content are heuristics with no optimality guarantee. H ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
Breakpoint phylogenies methods have been shown to be an effective tool for extracting phylogenetic information from gene order data. Currently, the only practical breakpoint phylogeny algorithms for the analysis of large genomes with varied gene content are heuristics with no optimality guarantee. Here we begin to address this lack by deriving lower bounds for the breakpoint median problem and for the more complicated breakpoint phylogeny problem. In both cases we employ Lagrange multipliers and sub-gradient optimization to tighten the bounds. The bounds have been implemented and are available as part of the GOTREE package (http://www.math.mcgill.ca/bryant/gotree). 2003 Elsevier B.V. All rights reserved.
Reconstructing Ancestral Gene Orders Using Conserved Intervals
- Proc. Fourth Int’l Workshop Algorithms in Bioinformatics (WABI ’04
, 2004
"... Abstract. Conserved intervals were recently introduced as a measure of similarity between genomes whose genes have been shuffled during evolution by genomic rearrangements. Phylogenetic reconstruction based on such similarity measures raises many biological, formal and algorithmic questions, in part ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
Abstract. Conserved intervals were recently introduced as a measure of similarity between genomes whose genes have been shuffled during evolution by genomic rearrangements. Phylogenetic reconstruction based on such similarity measures raises many biological, formal and algorithmic questions, in particular the labelling of internal nodes with putative ancestral gene orders, and the selection of a good tree topology. In this paper, we investigate the properties of sets of permutations associated to conserved intervals as a representation of putative ancestral gene orders for a given tree topology. We define set-theoretic operations on sets of conserved intervals, together with the associated algorithms, and we apply these techniques, in a manner similar to the Fitch-Hartigan algorithm for parsimony, to a subset of chloroplast genes of 13 species. 1
Transforming men into mice: the Nadeau-Taylor chromosal breakage model revisited
- IN RECOMB
, 2003
"... Although analysis of genome rearrangements was pioneered by Dobzhansky and Sturtevant 65 years ago, we still know very little about the rearrangement events that produced the existing varieties of genomic architectures. The genomic sequences of human and mouse provide evidence for a larger number of ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Although analysis of genome rearrangements was pioneered by Dobzhansky and Sturtevant 65 years ago, we still know very little about the rearrangement events that produced the existing varieties of genomic architectures. The genomic sequences of human and mouse provide evidence for a larger number of rearrangements than previously thought and shed some light on previously unknown features of mammalian evolution. In particular, they reveal extensive re-use of breakpoints from the same relatively short regions. Our analysis implies the existence of a large number of very short “hidden” synteny blocks that were invisible in comparative mapping data and were not taken into account in previous studies of chromosome evolution. These blocks are defined by closely located breakpoints and are often hard to detect. Our result is in conflict with the widely accepted random breakage model of chromosomal evolution. We suggest a new “fragile breakage” model of chromosome evolution that postulates that breakpoints are chosen from relatively short fragile regions that have much higher propensity for rearrangements than the rest of the genome.
Quartet methods for phylogeny reconstruction from gene orders
- Dept. CS and Engin., Univ. South-Carolina
, 2005
"... Abstract. Phylogenetic reconstruction from gene-rearrangement data has attracted increasing attention from biologists and computer scientists. Methods used in reconstruction include distance-based methods, parsimony methods using sequence-based encodings, and direct optimization. The latter, pioneer ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Abstract. Phylogenetic reconstruction from gene-rearrangement data has attracted increasing attention from biologists and computer scientists. Methods used in reconstruction include distance-based methods, parsimony methods using sequence-based encodings, and direct optimization. The latter, pioneered by Sankoff and extended by us with the software suite GRAPPA, is the most accurate approach; however, its exhaustive approach means that it can be applied only to small datasets of fewer than 15 taxa. While we have successfully scaled it up to 1,000 genomes by integrating it with a diskcovering method (DCM-GRAPPA), the recursive decomposition may need many levels of recursion to handle datasets with 1,000 or more genomes. We thus investigated quartet-based approaches, which directly decompose the datasets into subsets of four taxa each; such approaches have been well studied for sequence data, but not for gene-rearrangement data. We give an optimization algorithm for the NP-hard problem of computing optimal trees for each quartet, present a variation of the dyadic method (using heuristics to choose suitable short quartets), and use both in simulation studies. We find that our quartet-based method can handle more genomes than the base version of GRAPPA, thus enabling us to reduce the number of levels of recursion in DCM-GRAPPA, but is more sensitive to the rate of evolution, with error rates rapidly increasing when saturation is approached. 1
Improving inversion median computation using commuting reversals and cycle information
- COMPARATIVE GENOMICS. VOLUME 4751
, 2007
"... In the past decade, genome rearrangements have attracted increasing attention from both biologists and computer scientists as a new type of data for phylogenetic analysis. Methods for reconstructing phylogeny from genome rearrangements include distance-based methods, MCMC methods and direct optimiza ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
In the past decade, genome rearrangements have attracted increasing attention from both biologists and computer scientists as a new type of data for phylogenetic analysis. Methods for reconstructing phylogeny from genome rearrangements include distance-based methods, MCMC methods and direct optimization methods. The latter, pioneered by Sankoff and extended with the software suite GRAPPA andMGR, is the most accurate approach, but is very limited due to the difficulty of its scoring procedure–it must solve multiple instances of median problem to compute the score of a given tree. The median problem is known to be NP-hard and all existing solvers are extremely slow when the genomes are distant. In this paper, we present a new inversion median heuristic for unichromisomal genomes. The new method works by applying sets of reversals in a batch where all such reversals both commute and do not break the cycle of any other. Our testing using simulated datasets shows that this method is much faster than the leading solver for difficult datasets with only a slight accuracy penalty, yet retains better accuracy than other heuristics with comparable speed. This new method will dramatically increase the speed of current direct optimization methods and enables us to extend the range of their applicability to organellar and small nuclear genomes with more than 50 inversions along each edge. As a further improvement, this new method can very quickly produce reasonable solutions to problems with hundreds of genes.

