Results 1 - 10
of
18
New data baseindependent, sequence tag-based scoring of peptide ms/ms data validates mowse scores, recovers below threshold data, singles out modified peptides, and assesses the quality of ms/ms techniques
- Mol. Cell. Proteomics
, 2005
"... The Mascot score (M-score) is one of the conventional validity measures in data base identification of peptides and proteins by MS/MS data. Although tremendously useful, M-score has a number of limitations. For the same MS/MS data, M-score may change if the protein data base is expanded. A low M-val ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
The Mascot score (M-score) is one of the conventional validity measures in data base identification of peptides and proteins by MS/MS data. Although tremendously useful, M-score has a number of limitations. For the same MS/MS data, M-score may change if the protein data base is expanded. A low M-value may not necessarily mean poor match but rather poor MS/MS quality. In addition M-score does not fully utilize the advantage of combined use of complementary fragmentation techniques collisionally activated dissociation (CAD) and electron capture dissociation (ECD). To address these issues, a new data baseindependent scoring method (S-score) was designed that is based on the maximum length of the peptide sequence tag provided by the combined CAD and ECD data. The quality of MS/MS spectra assessed by S-score allows poor data (39%
Protein identification by mass spectrometry: issues to be considered
- Mol. Cell. Proteomics
, 2004
"... During the past two decades, mass spectrometry has become established as the primary method for protein identification from complex mixtures of biological origin. This is largely attributable to the fortunate coincidence of instrumental advances that allow routine analysis of minute amounts (typical ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
During the past two decades, mass spectrometry has become established as the primary method for protein identification from complex mixtures of biological origin. This is largely attributable to the fortunate coincidence of instrumental advances that allow routine analysis of minute amounts (typically femtomoles) of involatile, polar compounds such as peptides in complex mixtures, with the rapid growth in genomic databases that are amenable to searching with mass spectrometry (MS) 1 data. Like many other developing fields in science, the creation of techniques and software tools and the initial generation and interpretation of data have been the domain of experts, people who are cognizant not only of the benefits of the methods but also of their actual and potential weaknesses. Now, as mass spectrometric techniques and proteomic tools become increasingly available and accessible,
Algorithmic complexity of protein identification: Combinatorics of weighted strings
- DISCRETE APPLIED MATHEMATICS, SPECIAL ISSUE ON COMBINATORICS OF SEARCHING, SORTING, AND CODING. (2002)
, 2004
"... We investigate a problem from computational biology: Given a constant size alphabet M with a weight function / : M--> +, find an efficient data structure and query algorithm solving the following problem: For a weight M C + and a string cr over A, decide whether cr contains a substring with weight M ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
We investigate a problem from computational biology: Given a constant size alphabet M with a weight function / : M--> +, find an efficient data structure and query algorithm solving the following problem: For a weight M C + and a string cr over A, decide whether cr contains a substring with weight M (ONE STRING MASS FINDING PROBLEM). If the answer is yes, then we may in addition require a witness, i.e. indices i _ i and ending at position j has weight M. We allow preprocessing of the string, and measure efficiency in two parameters: storage space required for the preprocessed data, and running time of the query algorithm for given M. We are interested in data structures and algorithms requiring subquadratic storage space and sublinear query time, where we measure the input size as the length of the input string. We present two efficient algorithms: LOOKUP solves the problem with O(,) space and (Wg ' loglog,) time; INTERVAL solves the problem for binary alphabets with O0, ) space in O(log,) time. We sketch a third al-gorithm, CLUSTER, which can be adjusted for a space time tradeoff but for which we do not yet have a resource analysis. We introduce a function on weighted strings which is closely related to the analysis of algorithms for the ONE STRING MASS FINDING PROBLEM: The number of different submasses of a weighted string. We present several properties of this function, including upper and lower bounds. Finally, we introduce two more general variants of the problem and sketch how algorithms may be extended for these variants.
Generating Peptide Candidates from Amino-Acid Sequence Databases for Protein Identification via Mass Spectrometry
- In Proceedings of the Second International Workshop on Algorithms in Bioinformatics
, 2002
"... Protein identification via mass spectrometry forms the foundation of high-throughput proteomics. Tandem mass spectrometry, when applied to a complex mixture of peptides, selects and fragments each peptide to reveal its amino-acid sequence structure. The successful analysis of such an experiment typi ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Protein identification via mass spectrometry forms the foundation of high-throughput proteomics. Tandem mass spectrometry, when applied to a complex mixture of peptides, selects and fragments each peptide to reveal its amino-acid sequence structure. The successful analysis of such an experiment typically relies on amino-acid sequence databases to provide a set of biologically relevant peptides to examine. A key subproblem, then, for amino-acid sequence database search engines that analyze tandem mass spectra is to efficiently generate all the peptide candidates from a sequence database with mass equal to one of a large set of observed peptide masses. We demonstrate that to solve the problem efficiently, we must deal with substring redundancy in the amino-acid sequence database and focus our attention on looking up the observed peptide masses quickly. We show that it is possible, with some preprocessing and memory overhead, to solve the peptide candidate generation problem in time asymptotically proportional to the size of the sequence database and the number of peptide candidates output.
Predictive methods using protein sequences
- Bioinformatics, A Practical Guide to the Analysis of Genes and Proteins, chapter 11
, 1998
"... The discussions of databases and information retrieval in earlier chapters of this book document the tremendous explosion in the amount of sequence information available in a variety of public databases. As we have already seen with nucleotide sequences, all protein sequences, whether determined dir ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The discussions of databases and information retrieval in earlier chapters of this book document the tremendous explosion in the amount of sequence information available in a variety of public databases. As we have already seen with nucleotide sequences, all protein sequences, whether determined directly or through the translation of an open reading frame in a nucleotide sequence, contain intrinsic information of value in determining their structure or function. Unfortunately, experiments aimed at extracting such information cannot keep pace with the rate at which raw sequence data are being produced. Techniques such as circular dichroism spectroscopy, optical rotatory dispersion, X-ray crystallography, and nuclear magnetic resonance are extremely powerful in determining structural features, but their execution requires many hours of highly skilled, technically demanding work. The gap in information becomes obvious in comparisons of the size of the protein sequence and structure databases; as of this writing, there were 87,143 protein entries (Release 39.0) in SWISS-PROT but only 12,624 structure entries (July, 2000) in PDB. Attempts to close the gap 253
Advances and challenges in liquid chromatography-mass spectrometry-based proteomics profiling for clinical applications
- Mol Cell Proteomics
, 2006
"... Recent advances in proteomics technologies provide tremendous opportunities for biomarker-related clinical applications; however, the distinctive characteristics of human biofluids such as the high dynamic range in protein abundances and extreme complexity of the proteomes present tremendous challen ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Recent advances in proteomics technologies provide tremendous opportunities for biomarker-related clinical applications; however, the distinctive characteristics of human biofluids such as the high dynamic range in protein abundances and extreme complexity of the proteomes present tremendous challenges. In this review we summarize recent advances in LC-MS-based proteomics profiling and its applications in clinical proteomics as well as discuss the major challenges associated with implementing these technologies for more effective candidate biomarker discovery. Developments in immunoaffinity depletion and various fractionation approaches in combination with substantial improvements in LC-MS platforms have enabled the plasma proteome to be profiled with considerably greater dynamic range of coverage,
Phosphoproteomics of the Arabidopsis Plasma Membrane and a New Phosphorylation Site Database W
"... Functional genomic technologies are generating vast amounts of data describing the presence of transcripts or proteins in plant cells. Together with classical genetics, these approaches broaden our understanding of the gene products required for specific responses. Looking to the future, the focus o ..."
Abstract
- Add to MetaCart
Functional genomic technologies are generating vast amounts of data describing the presence of transcripts or proteins in plant cells. Together with classical genetics, these approaches broaden our understanding of the gene products required for specific responses. Looking to the future, the focus of research must shift to the dynamic aspects of biology: molecular mechanisms of function and regulation. Phosphorylation is a key regulatory factor in all aspects of plant biology; but it is difficult, if not impossible, for most researchers to identify in vivo phosphorylation sites within their proteins of interest. We have developed a large-scale strategy for the isolation of phosphopeptides and identification by mass spectrometry (Nühse et al., 2003b). Here, we describe the identification of more than 300 phosphorylation sites from Arabidopsis thaliana plasma membrane proteins. These data will be a valuable resource for many fields of plant biology and overcome a major impediment to the elucidation of signal transduction pathways. We present an analysis of the characteristics of phosphorylation sites, their conservation among orthologs and paralogs, and the existence of putative motifs surrounding the sites. These analyses yield general principles for predicting other phosphorylation sites in plants and provide indications of specificity determinants for responsible kinases. In addition, more than 50 sites were mapped on receptor-like kinases and revealed an unexpected complexity of regulation. Finally, the data also provide empirical evidence on the topology of transmembrane proteins. This information indicates that prediction programs incorrectly identified the cytosolic portion of the protein in 25 % of the transmembrane proteins found in this study. All data are deposited in a new searchable database for plant phosphorylation sites maintained by PlantsP
Commentary re: Protemics-based Identification of RS/DJ-1 as a Novel Circulating Tumor Antigen
"... With the progress of the human genome project, a largescale analysis of proteins within a single experiment, called proteomics, has gained much attention. Mass spectrometry and related techniques have rapidly developed with genome database availability. Using mass spectrometry techniques, one can id ..."
Abstract
- Add to MetaCart
With the progress of the human genome project, a largescale analysis of proteins within a single experiment, called proteomics, has gained much attention. Mass spectrometry and related techniques have rapidly developed with genome database availability. Using mass spectrometry techniques, one can identify proteins in the database without going through the tedious and time-consuming traditional techniques of HPLC 2 peptide mapping and then Edman degradation, oligonucleotides synthesis, PCR, and gene cloning. Thus, mass spectrometry techniques are gaining popularity as a versatile method for protein identification. Compared with traditional protein sequencing techniques, mass spectrometry analysis of protein is much more sensitive and faster. Now that these techniques are available, the question is, “How we can apply them to biomedical science? ” In this issue, Le Naour et al. (1) apply proteomics-based

