Results 11 - 20
of
1,298
Entropia: Architecture and Performance of an Enterprise Desktop Grid System
, 2003
"... The exploitation of idle cycles on pervasive desktop PC systems offers the opportunity to increase the available computing power by orders of magnitude (10x - 1000x). However, for desktop PC distributed computing to be widely accepted within the enterprise, the systems must achieve high levels of ef ..."
Abstract
-
Cited by 111 (6 self)
- Add to MetaCart
The exploitation of idle cycles on pervasive desktop PC systems offers the opportunity to increase the available computing power by orders of magnitude (10x - 1000x). However, for desktop PC distributed computing to be widely accepted within the enterprise, the systems must achieve high levels of efficiency, robustness, security, scalability, manageability, unobtrusiveness, and openness/ease of application integration. We describe the Entropia distributed computing system as a case study, detailing its internal architecture and philosophy in attacking these key problems. Key aspects of the Entropia system include the use of: 1) binary sandboxing technology for security and unobtrusiveness, 2) a layered architecture for efficiency, robustness, scalability and manageability, and 3) an open integration model to allow applications from many sources to be incorporated. Typical applications for the Entropia System includes molecular docking, sequence analysis, chemical structure modeling, and risk management. The applications come from a diverse set of domains including virtual screening for drug discovery, genomics for drug targeting, material property prediction, and portfolio management. In all cases, these applications scale to many thousands of nodes and have no dependences between tasks. We present representative performance results from several applications that illustrate the high performance, linear scaling, and overall capability presented by the Entropia system.
Mismatch String Kernels for SVM Protein Classification
"... We introduce a class of string kernels, called mismatch kernels, for use with support vector machines (SVMs) in a discriminative approach to the protein classification problem. These kernels measure sequence similarity based on shared occurrences of k-length subsequences, counted with up to m mi ..."
Abstract
-
Cited by 100 (14 self)
- Add to MetaCart
We introduce a class of string kernels, called mismatch kernels, for use with support vector machines (SVMs) in a discriminative approach to the protein classification problem. These kernels measure sequence similarity based on shared occurrences of k-length subsequences, counted with up to m mismatches, and do not rely on any generative model for the positive training sequences. We compute the kernels efficiently using a mismatch tree data structure and report experiments on a benchmark SCOP dataset, where we show that the mismatch kernel used with an SVM classifier performs as well as the Fisher kernel, the most successful method for remote homology detection, while achieving considerable computational savings.
Review: Protein Secondary Structure Prediction Continues to Rise
- J. Struct. Biol
, 2001
"... f prediction accuracy? We shall see. 2001 Academic Press INTRODUCTION History. Linus Pauling correctly guessed the formation of helices and strands (14, 15) (and falsely hypothesized other structures). Three years before Pauling's guess was verified by the publications of the first X-ray structure ..."
Abstract
-
Cited by 92 (13 self)
- Add to MetaCart
f prediction accuracy? We shall see. 2001 Academic Press INTRODUCTION History. Linus Pauling correctly guessed the formation of helices and strands (14, 15) (and falsely hypothesized other structures). Three years before Pauling's guess was verified by the publications of the first X-ray structures (16, 17), one group had already ventured to predict secondary structure from sequence (18). The first-generation prediction methods following in the 1960s and 1970s were all based on single amino acid propensities (19). The second-generation methods dominating the scene until the early 1990s used propensities for segments of 3--51 adjacent residues (19). Basically any imaginable theoretical algorithm had been applied to the problem of predicting secondary structure from sequence. However, it seemed that prediction accuracy stalled at levels slightly above 60% (percentage of residues predicted correctly in one of the three states: helix, strand, and other). The reason for this limit was the
Adaptive Computing on the Grid Using AppLeS
, 2003
"... Ensembles of distributed, heterogeneous resources, also known as Computational Grids are emerging as critical platforms for high-performance and resource-intensive applications. Such platforms provide the potential for applications to aggregate enormous bandwidth, computational power, memory, second ..."
Abstract
-
Cited by 90 (7 self)
- Add to MetaCart
Ensembles of distributed, heterogeneous resources, also known as Computational Grids are emerging as critical platforms for high-performance and resource-intensive applications. Such platforms provide the potential for applications to aggregate enormous bandwidth, computational power, memory, secondary storage, and other resources during a single execution. However, achieving this performance potential in dynamic, heterogeneous environments is challenging. Recent experience with distributed applications indicates that adaptivity is fundamental to achieving application performance in dynamic Grid environments. The AppLeS (Application Level Scheduling) project provides a methodology, application software, and software environments for adaptively scheduling and deploying applications in dynamic, heterogeneous, multi-user Grid environments. In this paper, we discuss the AppLeS project and outline our results.
Mismatch string kernels for discriminative protein classification
- Bioinformatics
, 2004
"... Motivation Classification of proteins sequences into functional and structural families based on sequence homology is a central problem in computational biology. Discriminative supervised machine learning approaches provide good performance, but simplicity and computational efficiency of training an ..."
Abstract
-
Cited by 90 (7 self)
- Add to MetaCart
Motivation Classification of proteins sequences into functional and structural families based on sequence homology is a central problem in computational biology. Discriminative supervised machine learning approaches provide good performance, but simplicity and computational efficiency of training and prediction are also important concerns. Results We introduce a class of string kernels, called mismatch kernels, for use with support vector machines (SVMs) in a discriminative approach to the problem of protein classification and remote homology detection. These kernels measure sequence similarity based on shared occurrences of fixed-length patterns in the data, allowing for mutations between patterns. Thus the kernels provide a biologically well-motivated way to compare protein sequences without relying on family-based generative models such as hidden Markov models. We compute the kernels efficiently using a mismatch tree data structure, allowing us to calculate the contributions of all patterns occurring in the data in one pass while traversing the tree. When used with an SVM, the kernels enable fast prediction on test sequences. We report experiments on two benchmark SCOP data sets, where we show that the mismatch kernel used with an SVM classifier performs competitively with state-of-the-art methods for homology detection, particularly when very few training examples are available. Examination of the highestweighted patterns learned by the SVM classifier recovers biologically important motifs in protein families and superfamilies. Availability SVM software is publically available at
Improving the Prediction of Protein Secondary Structure in Three and Eight Classes Using Recurrent Neural Networks and Profiles
, 2001
"... Secondarystructurepredictions areincreasinglybecomingtheworkhorseforseveralmethodsaimingatpredictingproteinstructure andfunction.Hereweuseensemblesofbidirectionalrecurrentneuralnetworkarchitectures, PSIBLAST -derivedprofiles,andalargenonredundant trainingsettoderivetwonewpredictors:(a)the secondvers ..."
Abstract
-
Cited by 87 (21 self)
- Add to MetaCart
Secondarystructurepredictions areincreasinglybecomingtheworkhorseforseveralmethodsaimingatpredictingproteinstructure andfunction.Hereweuseensemblesofbidirectionalrecurrentneuralnetworkarchitectures, PSIBLAST -derivedprofiles,andalargenonredundant trainingsettoderivetwonewpredictors:(a)the secondversionoftheSSproprogramforsecondary structureclassificationintothreecategoriesand(b) thefirstversionoftheSSpro8programforsecondarystructureclassificationintotheeightclasses producedbytheDSSPprogram.Wedescribethe resultsofthreedifferenttestsetsonwhichSSpro achievedasustainedperformanceofabout78% correctprediction.Wereportconfusionmatrices, comparePSI-BLASTtoBLAST-derivedprofiles,and assessthecorrespondingperformanceimprovements. SSproandSSpro8areimplementedasweb servers,availabletogetherwithotherstructural featurepredictorsat:http://promoter.ics.uci.edu/ BRNN-PRED/.Proteins2002;47:228--235.
PROBCONS: Probabilistic consistency-based multiple sequence alignment
- Genome Res
, 2005
"... To study gene evolution across a wide range of organisms, biologists need accurate tools for multiple sequence alignment of protein families. Obtaining accurate alignments, however, is a difficult computational problem because of not only the high computational cost but also the lack of proper objec ..."
Abstract
-
Cited by 86 (5 self)
- Add to MetaCart
To study gene evolution across a wide range of organisms, biologists need accurate tools for multiple sequence alignment of protein families. Obtaining accurate alignments, however, is a difficult computational problem because of not only the high computational cost but also the lack of proper objective functions for measuring alignment quality. In this paper, we introduce prob-abilistic consistency, a novel scoring function for multiple sequence comparisons. We present PROBCONS, a practical tool for progressive protein multiple sequence alignment based on prob-abilistic consistency, and evaluate its performance on several standard alignment benchmark datasets. On the BAliBASE, SABmark, and PREFAB benchmark alignment databases, PROB-CONS achieves statistically significant improvement over other leading methods while maintain-ing practical speed. PROBCONS is publicly available as a web resource. Source code and execu-tables are available under the GNU Public License at
The relationship between protein structure and function: a Yearbook of Medical Informatics 2001 97 Paper comprehensive survey with application to the yeast genome
- J Mol Biol
"... (Version ff225rev sent to the Journal of Molecular Biology) For most proteins in the genome databases, function is predicted via sequence comparison. In spite of the popularity of this approach, the extent to which it can be reliably applied is unknown. We address this issue by systematically invest ..."
Abstract
-
Cited by 84 (22 self)
- Add to MetaCart
(Version ff225rev sent to the Journal of Molecular Biology) For most proteins in the genome databases, function is predicted via sequence comparison. In spite of the popularity of this approach, the extent to which it can be reliably applied is unknown. We address this issue by systematically investigating the relationship between protein function and structure. We focus initially on enzymes classified by the Enzyme Commission (EC) and relate these to structurally classified proteins in the SCOP database. We find that the major SCOP fold classes have different propensities to carry out certain broad categories of functions. For instance, alpha/beta folds are disproportionately associated with enzymes, especially transferases and hydrolases, and all-alpha and small folds with non-enzymes, while alpha+beta folds have an equal tendency either way. These observations for the database overall are largely true for specific genomes. We focus, in particular, on yeast, analyzing it with many classifications in addition to SCOP and EC (i.e. COGs, CATH, MIPS), and find clear tendencies for fold-function association, across a broad spectrum of functions. Analysis with the COGs scheme also suggests that the functions of the most ancient proteins are more evenly distributed among different structural classes
Hmmstr: a hidden markov model for local sequence-structure correlations in proteins
- Journal of Molecular Biology
, 2000
"... *Corresponding authors ..."
GenBank: update
- Nucleic Acids Res
, 2004
"... GenBank (R) is a comprehensive database that contains publicly available DNA sequences for more than 140 000 named organisms, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects. Most submissions are made using the BankIt (we ..."
Abstract
-
Cited by 78 (3 self)
- Add to MetaCart
GenBank (R) is a comprehensive database that contains publicly available DNA sequences for more than 140 000 named organisms, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects. Most submissions are made using the BankIt (web) or Sequin program and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the EMBL Data Library in the UK and the DNA Data Bank of Japan helps ensure worldwide coverage. GenBank is accessible through NCBI's retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, go to the NCBI home page at:

