Results 11 - 20
of
537
Reinvestigation of the Saccharomyces cerevisiae genome annotation by comparison to the genome of a related fungus: Ashbya gossypii
- Genome Biol
, 2003
"... The electronic version of this article is the complete one and can be found online at ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
The electronic version of this article is the complete one and can be found online at
Protein Sequence Threading: Averaging over Structures
, 2002
"... Multiplesequencealignmentsare aroutinetoolinproteinfoldrecognition,butmultiplestructurealignmentsarecomputationallyless cooperative.Thisworkdescribesamethodforproteinsequencethreadingandsequence -to-structure alignmentsthatusesmultiplealignedstructures, theaimbeingtoimprovemodelsfromprotein threadin ..."
Abstract
-
Cited by 8 (5 self)
- Add to MetaCart
Multiplesequencealignmentsare aroutinetoolinproteinfoldrecognition,butmultiplestructurealignmentsarecomputationallyless cooperative.Thisworkdescribesamethodforproteinsequencethreadingandsequence -to-structure alignmentsthatusesmultiplealignedstructures, theaimbeingtoimprovemodelsfromprotein threadingcalculations.Sequencesarealignedinto afieldduetocorrespondingsitesinhomologous proteins.Onthebasisofatestsetofmorethan570 proteinpairs,theproceduredoesimprovealignmentquality, althoughnomorethanaveragingover sequences.Fortheforcefieldtested,thebenefitof structureaveragingissmallerthanthatofadding sequencesimilaritytermsoracontributionfrom secondarystructurepredictions.Althoughthereis asignificantimprovementinthequalityofsequenceto -structurealignments,thisdoesnotdirectlytranslatetoanimmediateimprovementinfoldrecogni - tioncapability.Proteins2002;47:496--505.
A generalized affine gap model significantly improves protein sequence alignment accuracy
- Proteins
, 2004
"... ABSTRACT Sequence alignment underpins common tasks in molecular biology, including genome annotation, molecular phylogenetics, and homology modeling. Fundamental to sequence alignment is the placement of gaps, which represent character insertions or deletions. We assessed the ability of a generalize ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
ABSTRACT Sequence alignment underpins common tasks in molecular biology, including genome annotation, molecular phylogenetics, and homology modeling. Fundamental to sequence alignment is the placement of gaps, which represent character insertions or deletions. We assessed the ability of a generalized affine gap cost model to reliably detect remote protein homology and to produce high-quality alignments. Generalized affine gap alignment with optimal gap parameters performed as well as the traditional affine gap model in remote homology detection. Evaluation of alignment quality showed that the generalized affine model aligns fewer residue pairs than the traditional affine model but achieves significantly higher per-residue accuracy. We conclude that generalized affine gap costs should be used when alignment accuracy carries more importance than aligned sequence length. Proteins 2005;58:329–338. © 2004 Wiley-Liss, Inc. Key words: remote homology detection; alignment quality; insertion; deletion; low-similarity region; unaligned
Choosing the best heuristic for seeded alignment of DNA sequences
- BMC BIOINFORMATICS
, 2006
"... Background: Seeded alignment is an important component of algorithms for fast, large-scale DNA similarity search. A good seed matching heuristic can reduce the execution time of genomicscale sequence comparison without degrading sensitivity. Recently, many types of seed have been proposed to improve ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Background: Seeded alignment is an important component of algorithms for fast, large-scale DNA similarity search. A good seed matching heuristic can reduce the execution time of genomicscale sequence comparison without degrading sensitivity. Recently, many types of seed have been proposed to improve on the performance of traditional contiguous seeds as used in, e.g., NCBI BLASTN. Choosing among these seed types, particularly those that use information besides the presence or absence of matching residue pairs, requires practical guidance based on a rigorous comparison, including assessment of sensitivity, specificity, and computational efficiency. This work performs such a comparison, focusing on alignments in DNA outside widely studied coding regions. Results: We compare seeds of several types, including those allowing transition mutations rather than matches at fixed positions, those allowing transitions at arbitrary positions ("BLASTZ " seeds), and those using a more general scoring matrix. For each seed type, we use an extended version of our Mandala seed design software to choose seeds with optimized sensitivity for various levels of specificity. Our results show that, on a test set biased toward alignments of noncoding DNA, transition information significantly improves seed performance, while finer distinctions between
CUDASW++: optimizing Smith-Waterman sequence database
, 2009
"... searches for CUDA-enabled graphics processing units ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
searches for CUDA-enabled graphics processing units
The expansion of the metazoan microRNA repertoire
- The Students of Bioinformatics Computer Labs 2004 and 2005
, 2006
"... MicroRNAs have been identified as crucial regulators in both animals and plants. Here we report on a comprehensive comparative study of all known miRNA families in animals. We expand the MicroRNA Registry 6.0 by more than 1000 new homologs of miRNA precursors whose expression has been verified in at ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
MicroRNAs have been identified as crucial regulators in both animals and plants. Here we report on a comprehensive comparative study of all known miRNA families in animals. We expand the MicroRNA Registry 6.0 by more than 1000 new homologs of miRNA precursors whose expression has been verified in at least one species. Using this uniform data basis we analyze their evolutionary history in terms of individual gene phylogenies and in terms of preservation of genomic nearness across species. This allows us to reliably identify microRNA clusters that are derived from a common transcript. We identify three episodes of microRNA innovation that correspond to major developmental innovations: A class of about 20 miRNAs is common to protostomes and deuterostomes and might be related to the advent of bilaterians. A second large wave of innovations maps to the branch leading to the vertebrates. The third significant outburst of miRNA innovation coincides with placental (eutherian) mammals. In addition, we observe the expected expansion of the microRNA inventory due to genome duplications in early vertebrates and in an ancestral teleost. The non-local duplications in the vertebrate ancestor are predated by local (tandem) duplications leading to the formation of about a dozend ancient microRNA clusters.
The protein non-folding problem: Amino acid determinants of intrinsic order and
, 2001
"... To investigate the determinants of protein order and disorder, three primary and one derivative database of intrinsically disordered proteins were compiled. The segments in each primary database were characterized by one of the following: X-ray crystallography, nuclear magnetic resonance (NMR), or c ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
To investigate the determinants of protein order and disorder, three primary and one derivative database of intrinsically disordered proteins were compiled. The segments in each primary database were characterized by one of the following: X-ray crystallography, nuclear magnetic resonance (NMR), or circular dichroism (CD). The derivative database was based on homology. The three primary disordered databases have a combined total of 157 proteins or segments of length ≥ 30 with 18,010 residues, while the derivative database contains 572 proteins from 32 families with 52,688 putatively disordered residues. For the four disordered databases, the amino acid compositions were compared with those from a database of ordered structure. Relative to the ordered protein, the intrinsically disordered segments in all four
A computational pipeline for protein structure prediction and analysis at genome scale
- Bioinformatics
, 1985
"... The tertiary (3D) structure of a protein contains the essential information for understanding the biological function of the protein at the molecular and cellular levels. Traditionally, protein 3D structures are solved using experimental techniques, like x-ray crystallography or nuclear magnetic res ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
The tertiary (3D) structure of a protein contains the essential information for understanding the biological function of the protein at the molecular and cellular levels. Traditionally, protein 3D structures are solved using experimental techniques, like x-ray crystallography or nuclear magnetic resonance (NMR). While these experimental techniques have been the main workhorse for protein structure studies in the past few decades, it is becoming increasingly apparent that they alone cannot keep up with the production rate of protein sequences as a result of worldwide genome sequencing and bioinformatics efforts. Fortunately, computational techniques for protein structure predictions have matured to such a level that they can complement the existing experimental techniques. In this paper, we present an automated pipeline for protein structure prediction. The centerpiece of the pipeline is a threading-based protein structure prediction system, called PROSPECT, which we have been developing for the past few years. The pipeline consists of seven logical phases, utilizing a dozen tools: (1) preprocessing to identify protein domains in the input sequence, (2) compilation of functional and structural information about a target protein through database search, (3) protein triage to determine which process and
A fast parallel algorithm for finding the longest common sequence of multiple biosequences
- BMC BIOINFORMATICS 2006, 7(SUPPL 4):S4
, 2006
"... Background. Biological sequences can be represented as a sequence of symbols. For instance, a protein is a sequence of 20 different letters (amino acids), and DNA sequences (genes) can be represented as sequences of four letters A,C,G and T, corresponding to the four sub-molecules forming DNA. When ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Background. Biological sequences can be represented as a sequence of symbols. For instance, a protein is a sequence of 20 different letters (amino acids), and DNA sequences (genes) can be represented as sequences of four letters A,C,G and T, corresponding to the four sub-molecules forming DNA. When a new biosequence is found, we want to know which other sequences it is most similar to. Sequence comparison has been used successfully to establish the link between cancer-causing genes and a gene evolved in normal growth and development. One way of detecting the similarity of two or more sequences is to find their longest common sequence (LCS). Searching for the LCS of biosequences is one of the most important tasks in bioinformatics. Here, on the premise of guaranteeing precision of the results of LCS, we present a parallel longest common subsequence algorithm named FAST_LCS based on a set of novel pruning techniques to improve the speed of finding LCS.
Datagrid, Prototype of a Biomedical Grid
- Methods of Information in Medicine
, 2003
"... this paper, we briefly present the DataGrid project, explain the relevance of the grid concept for genomics and medical imaging and describe the first applications being deployed on DataGrid as a proof of concept of a biomedical grid ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
this paper, we briefly present the DataGrid project, explain the relevance of the grid concept for genomics and medical imaging and describe the first applications being deployed on DataGrid as a proof of concept of a biomedical grid

