Results 1 - 10
of
105
BAliBASE 3.0: latest developments of the multiple sequence alignment benchmark
- Proteins
, 2005
"... ABSTRACT Multiple sequence alignment is one of the cornerstones of modern molecular biology. It is used to identify conserved motifs, to determine protein domains, in 2D/3D structure prediction by homology and in evolutionary studies. Recently, high-throughput technologies such as genome sequencing ..."
Abstract
-
Cited by 48 (1 self)
- Add to MetaCart
ABSTRACT Multiple sequence alignment is one of the cornerstones of modern molecular biology. It is used to identify conserved motifs, to determine protein domains, in 2D/3D structure prediction by homology and in evolutionary studies. Recently, high-throughput technologies such as genome sequencing and structural proteomics have lead to an explosion in the amount of sequence and structure information available. In response, several new multiple alignment methods have been developed that improve both the efficiency and the quality of protein alignments. Consequently, the benchmarks used to evaluate and compare these methods must also evolve. We present here the latest release of the most widely used multiple alignment benchmark, BAliBASE, which provides high quality, manually refined, reference alignments based on 3D structural superpositions. Version 3.0 of BAliBASE includes new, more challenging test cases, representing the real problems encountered when aligning large sets of complex sequences. Using a novel, semiautomatic update protocol, the number of protein families in the benchmark has been increased and representative test cases are now available that cover most of the protein fold space. The total number of proteins in BAliBASE has also been significantly increased from 1444 to 6255 sequences. In addition, full-length sequences are now provided for all test cases, which represent difficult cases for both global and local alignment programs. Finally, the BAliBASE Web site
Kalign – an accurate and fast multiple sequence alignment algorithm
- BMC BIOINFORMATICS
, 2005
"... ..."
Implications of structural genomics target selection strategies: Pfam5000, whole genome, and random approaches
- Proteins
, 2005
"... ABSTRACT Structural genomics is an international effort to determine the three-dimensional shapes of all important biological macromolecules, with a primary focus on proteins. Target proteins should be selected according to a strategy that is medically and biologically relevant, of good value, and t ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
ABSTRACT Structural genomics is an international effort to determine the three-dimensional shapes of all important biological macromolecules, with a primary focus on proteins. Target proteins should be selected according to a strategy that is medically and biologically relevant, of good value, and tractable. As an option to consider, we present the “Pfam5000 ” strategy, which involves selecting the 5000 most important families from the Pfam database as sources for targets. We compare the Pfam5000 strategy to several other proposed strategies that would require similar numbers of targets. These strategies include complete solution of several small to moderately sized bacterial proteomes, partial coverage of the human proteome, and random
Prot4EST: translating expressed sequence tags from neglected genomes
- BMC Bioinformatics
, 2004
"... Software ..."
Detecting non-coding selective pressure in coding regions
- BMC Evolutionary Biology 7(Suppl
, 2007
"... Background: Comparative genomics approaches, where orthologous DNA regions are compared and inter-species conserved regions are identified, have proven extremely powerful for identifying non-coding regulatory regions located in intergenic or intronic regions. However, non-coding functional elements ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Background: Comparative genomics approaches, where orthologous DNA regions are compared and inter-species conserved regions are identified, have proven extremely powerful for identifying non-coding regulatory regions located in intergenic or intronic regions. However, non-coding functional elements can also be located within coding region, as is common for exonic splicing enhancers, some transcription factor binding sites, and RNA secondary structure elements affecting mRNA stability, localization, or translation. Since these functional elements are located in regions that are themselves highly conserved because they are coding for a protein, they generally escaped detection by comparative genomics approaches. Results: We introduce a comparative genomics approaches for detecting non-coding functional elements located within coding regions. Codon evolution is modeled as a mixture of codon substitution models, where each component of the mixture describes the evolution of codons under a specific type of coding selective pressure. We show how to compute the posterior distribution of the entropy and parsimony scores under this null model of codon evolution. The method is applied to a set of growth hormone 1 orthologous mRNA sequences and a known exonic splicing elements is detected. The analysis of a set of CORTBP2 orthologous genes reveals a region of several hundred base pairs under strong non-coding selective pressure whose function remains unknown.
Detecting conserved interaction patterns in biological networks
- Journal of Computational Biology
, 2006
"... Molecular interaction data plays an important role in understanding biological processes at a modular level by providing a framework for understanding cellular organization, functional hierarchy, and evolutionary conservation. As the quality and quantity of network and interaction data increases rap ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Molecular interaction data plays an important role in understanding biological processes at a modular level by providing a framework for understanding cellular organization, functional hierarchy, and evolutionary conservation. As the quality and quantity of network and interaction data increases rapidly, the problem of effectively analyzing this data becomes significant. Graph theoretic formalisms, commonly used for these analysis tasks, often lead to computationally hard problems due to their relation to subgraph isomorphism. This paper presents an innovative new algorithm, MULE, for detecting frequently occurring patterns and modules in biological networks. Using an innovative graph simplification technique based on ortholog contraction, which is ideally suited to biological networks, our algorithm renders these problems computationally tractable and scalable to large numbers of networks. We show, experimentally, that our algorithm can extract frequently occurring patterns in metabolic pathways and protein interaction networks from the KEGG, DIP, and BIND databases within seconds. When compared to existing approaches, our graph simplification technique can be viewed either as a pruning heuristic, or a closely related, but computationally simpler task. When used as a pruning heuristic, we show that our technique reduces effective graph sizes significantly, accelerating existing techniques by several orders of magnitude! Indeed, for most of the test cases, existing techniques could not even be applied without our pruning step. When used as a stand-alone analysis technique, MULE is shown to convey significant biological insights at near-interactive rates. The software, sample input graphs, and detailed results for comprehensive analysis of nine eukaryotic PPI networks are available at www.cs.purdue.edu/homes/koyuturk/mule. Key words: graph mining, frequent subgraph discovery, evolution, modular conservation. 1.
BioMed Central
, 2006
"... Research article Mariner mutagenesis of Brucella melitensis reveals genes with previously uncharacterized roles in virulence and survival ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Research article Mariner mutagenesis of Brucella melitensis reveals genes with previously uncharacterized roles in virulence and survival
Large-Scale Prediction of Function Shift in Protein Families with a Focus on Enzymatic Function
"... ABSTRACT Protein function shift can be predicted from sequence comparisons, either using positive selection signals or evolutionary rate estimation. None of the methods have been validated on large datasets, however. Here we investigate existing and novel methods for protein function shift predictio ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
ABSTRACT Protein function shift can be predicted from sequence comparisons, either using positive selection signals or evolutionary rate estimation. None of the methods have been validated on large datasets, however. Here we investigate existing and novel methods for protein function shift prediction, and benchmark the accuracy against a large dataset of proteins with known enzymatic functions. Function change was predicted between subfamilies by identifying two kinds of sites in a multiple sequence alignment: Conservation-Shifting Sites (CSS), which are conserved in two subfamilies using two different amino acid types, and Rate-Shifting Sites (RSS), which have different evolutionary rates in two subfamilies. CSS were predicted by a new entropy-based method, and RSS using the Rate-Shift program. In principle, the more CSS and RSS between two subfamilies, the more likely a function shift between them. A test dataset was built by extracting subfamilies from Pfam with different EC numbers that belong to the same domain family. Subfamilies were generated automatically using a phylogenetic tree-based program, BETE. The dataset comprised 997 subfamily pairs with four or more members per subfamily. We observed a significant increase in CSS and RSS for subfamily comparisons with different EC numbers compared to cases with same EC numbers. The discrimination was better using RSS than CSS, and was more pronounced for larger families. Combining RSS and CSS by discriminant analysis improved classification accuracy to 71%. The method was applied to the Pfam database and the results are available at
Structure Determination of a New Protein From Backbone-Centered NMR Data and NMR-Assisted Structure Prediction
"... ABSTRACT Targeting of proteins for structure determination in structural genomic programs often includes the use of threading and fold recognition methods to exclude proteins belonging to well-populated fold families, but such methods can still fail to recognize preexisting folds. The authors illust ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
ABSTRACT Targeting of proteins for structure determination in structural genomic programs often includes the use of threading and fold recognition methods to exclude proteins belonging to well-populated fold families, but such methods can still fail to recognize preexisting folds. The authors illustrate here a method in which limited amounts of structural data are used to improve an initial homology search and the data are subsequently used to produce a structure by data-constrained refinement of an identified structural template. The data used are primarily NMR-based residual dipolar couplings, but they also include additional chemical shift and backbone-nuclear Overhauser effect data. Using this methodology, a backbone structure was efficiently produced for a 10 kDa protein (PF1455) from Pyrococcus furiosus. Its relationship to existing structures and its probable function are discussed. Proteins 2006;65:480–489. VC 2006 Wiley-Liss, Inc. Key words: protein structure prediction; structural genomics; residual dipolar couplings; Pyrococcus furiosus; simulated annealing

