Results 1 - 10
of
33
SCOPE: a probabilistic model for scoring tandem mass spectra against a peptide database
- Bioinformatics
, 2001
"... Proteomics, or the direct analysis of the expressed protein components of a cell, is critical to our understanding of cellular biological processes in normal and diseased tissue. A key requirement for its success is the ability to identify proteins in complex mixtures. Recent technological advances ..."
Abstract
-
Cited by 33 (4 self)
- Add to MetaCart
Proteomics, or the direct analysis of the expressed protein components of a cell, is critical to our understanding of cellular biological processes in normal and diseased tissue. A key requirement for its success is the ability to identify proteins in complex mixtures. Recent technological advances in tandem mass spectrometry has made it the method of choice for high-throughput identification of proteins. Unfortunately, the software for unambiguously identifying peptide sequences has not kept pace with the recent hardware improvements in mass spectrometry instruments. Critical for reliable high-throughput protein identification, scoring functions evaluate the quality of a match between experimental spectra and a database peptide. Current scoring function technology relies heavily on ad-hoc parameterization and manual curation by experienced mass spectrometrists. In this work, we propose a two-stage stochastic model for the observed MS/MS spectrum, given a peptide. Our model explicitly incorporates fragment ion probabilities, noisy spectra, and instrument measurement error. We describe how to compute this probability based score efficiently, using a dynamic programming technique. A prototype implementation demonstrates the effectiveness of the model. Contact:
On De Novo Interpretation of Tandem Mass Spectra for Peptide Identification
, 2003
"... The correct interpretation of tandem mass spectra is a difficult problem, even when it is limited to scoring peptides against a database. De novo sequencing is considerably harder, but critical when sequence databases are incomplete or not available. In this paper we build upon earlier work due to D ..."
Abstract
-
Cited by 17 (3 self)
- Add to MetaCart
The correct interpretation of tandem mass spectra is a difficult problem, even when it is limited to scoring peptides against a database. De novo sequencing is considerably harder, but critical when sequence databases are incomplete or not available. In this paper we build upon earlier work due to Dancik et al., and Chen et al. to provide a dynamic programming algorithm for interpreting de novo spectra. Our method can handle most of the commonly occurring ions, including a, b, y, and their neutral losses. Additionally, we shift the emphasis away from sequencing to assigning ion types to peaks. In particular, we introduce the notion of core interpretations, which allow us to give confidence values to individual peak assignments, even in the absence of a strong interpretation. Finally, we introduce a systematic approach to evaluating de novo algorithms as a function of spectral quality. We show that our algorithm, in particular the core-interpretation, is robust in the presence of measurement error, and low fragmentation probability.
SPIDER: Software for Protein identification from Sequence Tags with De Novo Sequencing Error
- J Bioinform Comput Biol
, 2004
"... For the identification of novel proteins using MS/MS, de novo sequencing software computes one or several possible amino acid sequences (called sequence tags) for each MS/MS spectrum. Those tags are then used to match, accounting amino acid mutations, the sequences in a protein database. If the de n ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
For the identification of novel proteins using MS/MS, de novo sequencing software computes one or several possible amino acid sequences (called sequence tags) for each MS/MS spectrum. Those tags are then used to match, accounting amino acid mutations, the sequences in a protein database. If the de novo sequencing gives correct tags, the homologs of the proteins can be identified by this approach and software such as MS-BLAST is available for the matching. However, de novo sequencing very often gives only partially correct tags. The most common error is that a segment of amino acids is replaced by another segment with approximately the same masses. We developed a new efficient algorithm to match sequence tags with errors to database sequences for the purpose of protein and peptide identification. A software package, SPIDER, was developed and made available on Internet for free public use. This paper describes the algorithms and features of the SPIDER software.
Efficiency of database search for identification of mutated and modified proteins via mass spectrometry
- GENOME RES
, 2001
"... ..."
A hidden markov model based scoring function for tandem mass spectrometry
- In RECOMB 2005
"... An accurate scoring function for database search is crucial for peptide identification using tandem mass spectrometry. Although many mathematical models have been proposed to score peptides against tandem mass spectra, our method (called PepHMM, ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
An accurate scoring function for database search is crucial for peptide identification using tandem mass spectrometry. Although many mathematical models have been proposed to score peptides against tandem mass spectra, our method (called PepHMM,
The Paragon Algorithm, a Next Generation Search Engine That Uses Sequence Temperature Values and Feature Probabilities to Identify Peptides from Tandem Mass Spectra * □S
"... The Paragon TM Algorithm, a novel database search engine for the identification of peptides from tandem mass spectrometry data, is presented. Sequence Temperature Values are computed using a sequence tag algorithm, allowing the degree of implication by an MS/MS spectrum of each region of a database ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
The Paragon TM Algorithm, a novel database search engine for the identification of peptides from tandem mass spectrometry data, is presented. Sequence Temperature Values are computed using a sequence tag algorithm, allowing the degree of implication by an MS/MS spectrum of each region of a database to be determined on a continuum. Counter to conventional approaches, features such as modifications, substitutions, and cleavage events are modeled with probabilities rather than by discrete user-controlled settings to consider or not consider a feature. The use of feature probabilities in conjunction with Sequence Temperature Values allows for a very large increase in the effective search space with only a very small increase in the actual number of hypotheses that must be scored. The algorithm has a new kind of user
Dynamic spectrum quality assessment and iterative computational analysis of shotgun proteomic data: toward more efficient identification of post-translational modifications, sequence polymorphisms, and novel peptides
- Mol Cell Proteomics
, 2006
"... In mass spectrometry-based proteomics, frequently hundreds of thousands of MS/MS spectra are collected in a single experiment. Of these, a relatively small fraction is confidently assigned to peptide sequences, whereas the majority of the spectra are not further analyzed. Spectra are not assigned to ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
In mass spectrometry-based proteomics, frequently hundreds of thousands of MS/MS spectra are collected in a single experiment. Of these, a relatively small fraction is confidently assigned to peptide sequences, whereas the majority of the spectra are not further analyzed. Spectra are not assigned to peptides for diverse reasons. These include deficiencies of the scoring schemes implemented in the database search tools, sequence variations (e.g. single nucleotide polymorphisms) or omissions in the database searched, post-translational or chemical modifications of the peptide analyzed, or the observation of sequences that are not anticipated from the genomic sequence (e.g. splice forms, somatic rearrangement, and processed proteins). To increase the amount of information that can be extracted from proteomic MS/MS datasets
New data baseindependent, sequence tag-based scoring of peptide ms/ms data validates mowse scores, recovers below threshold data, singles out modified peptides, and assesses the quality of ms/ms techniques
- Mol. Cell. Proteomics
, 2005
"... The Mascot score (M-score) is one of the conventional validity measures in data base identification of peptides and proteins by MS/MS data. Although tremendously useful, M-score has a number of limitations. For the same MS/MS data, M-score may change if the protein data base is expanded. A low M-val ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
The Mascot score (M-score) is one of the conventional validity measures in data base identification of peptides and proteins by MS/MS data. Although tremendously useful, M-score has a number of limitations. For the same MS/MS data, M-score may change if the protein data base is expanded. A low M-value may not necessarily mean poor match but rather poor MS/MS quality. In addition M-score does not fully utilize the advantage of combined use of complementary fragmentation techniques collisionally activated dissociation (CAD) and electron capture dissociation (ECD). To address these issues, a new data baseindependent scoring method (S-score) was designed that is based on the maximum length of the peptide sequence tag provided by the combined CAD and ECD data. The quality of MS/MS spectra assessed by S-score allows poor data (39%
Protein identification by mass spectrometry: issues to be considered
- Mol. Cell. Proteomics
, 2004
"... During the past two decades, mass spectrometry has become established as the primary method for protein identification from complex mixtures of biological origin. This is largely attributable to the fortunate coincidence of instrumental advances that allow routine analysis of minute amounts (typical ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
During the past two decades, mass spectrometry has become established as the primary method for protein identification from complex mixtures of biological origin. This is largely attributable to the fortunate coincidence of instrumental advances that allow routine analysis of minute amounts (typically femtomoles) of involatile, polar compounds such as peptides in complex mixtures, with the rapid growth in genomic databases that are amenable to searching with mass spectrometry (MS) 1 data. Like many other developing fields in science, the creation of techniques and software tools and the initial generation and interpretation of data have been the domain of experts, people who are cognizant not only of the benefits of the methods but also of their actual and potential weaknesses. Now, as mass spectrometric techniques and proteomic tools become increasingly available and accessible,
De novo peptide sequencing and identification with precision mass spectrometry
- J. Proteome Res
, 2007
"... The recent proliferation of novel mass spectrometers such as Fourier transform, QTOF, and OrbiTrap marks a transition into the era of precision mass spectrometry, providing a 2 orders of magnitude boost to the mass resolution, as compared to low-precision ion-trap detectors. We investigate peptide d ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
The recent proliferation of novel mass spectrometers such as Fourier transform, QTOF, and OrbiTrap marks a transition into the era of precision mass spectrometry, providing a 2 orders of magnitude boost to the mass resolution, as compared to low-precision ion-trap detectors. We investigate peptide de novo sequencing by precision mass spectrometry and explore some of the differences when compared to analysis of low-precision data. We demonstrate how the dramatically improved performance of de novo sequencing with precision mass spectrometry paves the way for novel approaches to peptide identification that are based on direct sequence lookups, rather than comparisons of spectra to a database. With the direct sequence lookup, it is not only possible to search a database very efficiently, but also to use the database in novel ways, such as searching for products of alternative splicing or products of fusion proteins in cancer. Our de novo sequencing software is available for download at

