Results 11 - 20
of
373
Using multiple alignments to improve gene prediction
- J. Comput. Biol
, 2005
"... Abstract. The multiple species de novo gene prediction problem can be stated as follows: given an alignment of genomic sequences from two or more organisms, predict the location and structure of all protein-coding genes in one or more of the sequences. Here, we present a new system, N-SCAN (a.k.a. T ..."
Abstract
-
Cited by 35 (2 self)
- Add to MetaCart
Abstract. The multiple species de novo gene prediction problem can be stated as follows: given an alignment of genomic sequences from two or more organisms, predict the location and structure of all protein-coding genes in one or more of the sequences. Here, we present a new system, N-SCAN (a.k.a. TWINSCAN 3.0), for addressing this problem. N-SCAN has the ability to model dependencies between the aligned sequences, context-dependent substitution rates, and insertions and deletions in the sequences. An implementation of N-SCAN was created and used to generate predictions for the entire human genome. An analysis of the predictions reveals that N-SCAN’s predictive accuracy in human exceeds that of all previously published whole-genome de novo gene predictors. In addition, predictions were generated for the genome of the fruit fly Drosophila melanogaster to demonstrate the applicability of N-SCAN to invertebrate gene prediction. 1
Performance study of phylogenetic methods: (unweighted) quartet methods and neighbor-joining
, 2003
"... ..."
Phylogenetic motif detection by expectation-maximization on evolutionary mixtures
- Pac. Symp. Biocomput
, 2004
"... The preferential conservation of transcription factor binding sites implies that non-coding sequence data from related species will prove a powerful asset to motif discovery. We present a unified probabilistic framework for motif discovery that incorporates of evolutionary information. We treat alig ..."
Abstract
-
Cited by 28 (1 self)
- Add to MetaCart
The preferential conservation of transcription factor binding sites implies that non-coding sequence data from related species will prove a powerful asset to motif discovery. We present a unified probabilistic framework for motif discovery that incorporates of evolutionary information. We treat aligned DNA sequence as a mixture of evolutionary models, for motif and background, and, following the example of the MEME program, provide an algorithm to estimate the parameters by Expectation-Maximization. We examine a variety of evolutionary models and show that our approach can take advantage of phylogenic information to avoid false positives and discover motifs upstream of groups of characterized target genes. We compare our method to traditional motif finding on only conserved regions. An implementation will be made available
A Polynomial Time Approximation Scheme for Inferring Evolutionary Trees from Quartet Topologies and Its Application
- SIAM Journal on Computing
, 2000
"... . Inferring evolutionary trees has long been a challenging problem both for biologists and computer scientists. In recent years research has concentrated on the quartet method paradigm for inferring evolutionary trees. Quartet methods proceed by first inferring the evolutionary history for every set ..."
Abstract
-
Cited by 27 (1 self)
- Add to MetaCart
. Inferring evolutionary trees has long been a challenging problem both for biologists and computer scientists. In recent years research has concentrated on the quartet method paradigm for inferring evolutionary trees. Quartet methods proceed by first inferring the evolutionary history for every set of four species (resulting in a set Q of inferred quartet topologies) and then recombining these inferred quartet topologies to form an evolutionary tree. This paper presents two results on the quartet method paradigm. The first is a polynomial time approximation scheme (PTAS) for recombining the inferred quartet topologies optimally. This is an important result since, to date, there have been no polynomial time algorithms with performance guarantees for quartet methods. To achieve this result the natural denseness of the set Q is exploited. The second result is a new technique, called quartet cleaning, that detects and corrects errors in the set Q with performance guarantees. This result h...
Adaptive Molecular Evolution
- In Balding,D., Bishop,M. and Cannings,C. (eds), Handbook of Statistical Genetics
, 2001
"... INTRODUCTION While Darwin's theory of evolution by natural selection is accepted by biologists for morphological traits, the importance of selection in molecular evolution has been much debated. The neutral theory (Kimura, 1983) maintains that most observed molecular vari- ation (both diversity wit ..."
Abstract
-
Cited by 27 (8 self)
- Add to MetaCart
INTRODUCTION While Darwin's theory of evolution by natural selection is accepted by biologists for morphological traits, the importance of selection in molecular evolution has been much debated. The neutral theory (Kimura, 1983) maintains that most observed molecular vari- ation (both diversity within species and divergence between species) is due to random fixation of mutations with fitness effects so small that random drift rather than natural selection dominates their fate. Population geneticists have developed a number of tests of neutrality (see Wayne and Simonsen, 1998, for a review). Those tests often easily reject the strictly neutral model when applied to real data. However, they are often unable to distinguish different forms of natural selection, or to demonstrate molecular adaptation. Up to now, the most convincing evidence of adaptive molecular evolution appears to have come from comparison of synonymous (silent) and non-synonymous (aminoacid -changing) substitution rate
Parallel implementation and performance of fastdnaml - a program for maximum likelihood phylogenetic inference
- In Proceedings of SC2001
, 2001
"... This paper describes the parallel implementation of fastDNAml, a program for the maximum likelihood inference of phylogenetic trees from DNA sequence data. Mathematical means of inferring phylogenetic trees have been made possible by the wealth of DNA data now available. Maximum likelihood analysis ..."
Abstract
-
Cited by 25 (0 self)
- Add to MetaCart
This paper describes the parallel implementation of fastDNAml, a program for the maximum likelihood inference of phylogenetic trees from DNA sequence data. Mathematical means of inferring phylogenetic trees have been made possible by the wealth of DNA data now available. Maximum likelihood analysis of phylogenetic trees is extremely computationally intensive. Availability of computer resources is a key factor limiting use of such analyses. fastDNAml is implemented in serial, PVM, and MPI versions, and may be modified to use other message passing libraries in the future. We have developed a viewer for comparing phylogenies. We tested the scaling behavior of fastDNAml on an IBM RS/6000 SP up to 64 processors. The parallel version of fastDNAml is one of very few computational phylogenetics codes that scale well. fastDNAml is available for download as source code or compiled for Linux or AIX.
The Ordinal Quartet Method
- PROCEEDINGS OF THE SECOND ANNUAL INTERNATIONAL CONFERENCE ON COMPUTATIONAL MOLECULAR BIOLOGY
, 1998
"... The utility of ordinal assertions for inferring evolutionary trees from sequence data is examined. If M is a difference matrix derived from a set of sequences then "M(s; x) M(s; y)" is an ordinal assertion supported by M where s, x and y are sequences. Ordinal assertions are shown to be an accurate ..."
Abstract
-
Cited by 24 (7 self)
- Add to MetaCart
The utility of ordinal assertions for inferring evolutionary trees from sequence data is examined. If M is a difference matrix derived from a set of sequences then "M(s; x) M(s; y)" is an ordinal assertion supported by M where s, x and y are sequences. Ordinal assertions are shown to be an accurate and robust source of evolutionary information. The following results are presented: A method for inferring quartet topology, the Ordinal Quartet Method, is introduced. Simulations are reported which demonstrate that the Ordinal Quartet Method is significantly more accurate and robust than the popular Weak Four--Point Condition Method. This improvement dramatically increases the accuracy of quartet recombination methods, such as the Short Quartet Method [10] and the Q Method [6] whose accuracy is critically dependent upon the ability to infer quartet topology accurately. It is also demonstrated that, unlike other quartet inference methods such as the Weak Four-Point Condition Method, the...
Coevolving protein residues: maximum likelihood identification and relationship to structure
- J. Mol. Biol
, 1999
"... There has been a great deal of recent research on ..."
On The Computational Complexity of Inferring Evolutionary Trees
, 1993
"... The process of reconstructing evolutionary trees can be viewed formally as an optimization problem. Recently, decision problems associated with the most commonly used approaches to reconstructing such trees have been shown to be NP-complete [Day87, DJS86, DS86, DS87, GF82, Kri88, KM86]. In this t ..."
Abstract
-
Cited by 19 (5 self)
- Add to MetaCart
The process of reconstructing evolutionary trees can be viewed formally as an optimization problem. Recently, decision problems associated with the most commonly used approaches to reconstructing such trees have been shown to be NP-complete [Day87, DJS86, DS86, DS87, GF82, Kri88, KM86]. In this thesis, a framework is established that incorporates all such problems studied to date. Within this framework, the NP-completeness results for decision problems are extended by applying theorems from [CT91, Gas86, GKR92, JVV86, KST89, Kre88, Sel91] to derive bounds on the computational complexity of several functions associated with each of these problems, namely ffl evaluation functions, which return the cost of the optimal tree(s), ffl solution functions, which return an optimal tree, ffl spanning functions, which return the number of optimal trees, ffl enumeration functions, which systematically enumerate all optimal trees, and ffl random-selection functions, which return a random...

