Results 1 - 10
of
23
Mutation-Tolerant Protein Identification by Mass Spectrometry
, 2000
"... Database search in tandem mass spectrometry is a powerful tool for protein identification. High-throughput spectral acquisition raises the problem of dealing with genetic variation and peptide modifications within a population of related proteins. A method that cross-correlates and clusters related ..."
Abstract
-
Cited by 31 (5 self)
- Add to MetaCart
Database search in tandem mass spectrometry is a powerful tool for protein identification. High-throughput spectral acquisition raises the problem of dealing with genetic variation and peptide modifications within a population of related proteins. A method that cross-correlates and clusters related spectra in large collections of uncharacterized spectra (i.e., from normal and diseased individuals) would be very valuable in functional proteomics. This problem is far from being simple since very similar peptides may have very different spectra. We introduce a new notion of spectral similarity that allows one to identify related spectra even if the corresponding peptides have multiple modifications/mutations. Based on this notion, we developed a new algorithm for mutation-tolerant database search as well as a method for cross-correlating related uncharacterized spectra.
On De Novo Interpretation of Tandem Mass Spectra for Peptide Identification
, 2003
"... The correct interpretation of tandem mass spectra is a difficult problem, even when it is limited to scoring peptides against a database. De novo sequencing is considerably harder, but critical when sequence databases are incomplete or not available. In this paper we build upon earlier work due to D ..."
Abstract
-
Cited by 17 (3 self)
- Add to MetaCart
The correct interpretation of tandem mass spectra is a difficult problem, even when it is limited to scoring peptides against a database. De novo sequencing is considerably harder, but critical when sequence databases are incomplete or not available. In this paper we build upon earlier work due to Dancik et al., and Chen et al. to provide a dynamic programming algorithm for interpreting de novo spectra. Our method can handle most of the commonly occurring ions, including a, b, y, and their neutral losses. Additionally, we shift the emphasis away from sequencing to assigning ion types to peaks. In particular, we introduce the notion of core interpretations, which allow us to give confidence values to individual peak assignments, even in the absence of a strong interpretation. Finally, we introduce a systematic approach to evaluating de novo algorithms as a function of spectral quality. We show that our algorithm, in particular the core-interpretation, is robust in the presence of measurement error, and low fragmentation probability.
SPIDER: Software for Protein identification from Sequence Tags with De Novo Sequencing Error
- J Bioinform Comput Biol
, 2004
"... For the identification of novel proteins using MS/MS, de novo sequencing software computes one or several possible amino acid sequences (called sequence tags) for each MS/MS spectrum. Those tags are then used to match, accounting amino acid mutations, the sequences in a protein database. If the de n ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
For the identification of novel proteins using MS/MS, de novo sequencing software computes one or several possible amino acid sequences (called sequence tags) for each MS/MS spectrum. Those tags are then used to match, accounting amino acid mutations, the sequences in a protein database. If the de novo sequencing gives correct tags, the homologs of the proteins can be identified by this approach and software such as MS-BLAST is available for the matching. However, de novo sequencing very often gives only partially correct tags. The most common error is that a segment of amino acids is replaced by another segment with approximately the same masses. We developed a new efficient algorithm to match sequence tags with errors to database sequences for the purpose of protein and peptide identification. A software package, SPIDER, was developed and made available on Internet for free public use. This paper describes the algorithms and features of the SPIDER software.
Efficiency of database search for identification of mutated and modified proteins via mass spectrometry
- GENOME RES
, 2001
"... ..."
The Paragon Algorithm, a Next Generation Search Engine That Uses Sequence Temperature Values and Feature Probabilities to Identify Peptides from Tandem Mass Spectra * □S
"... The Paragon TM Algorithm, a novel database search engine for the identification of peptides from tandem mass spectrometry data, is presented. Sequence Temperature Values are computed using a sequence tag algorithm, allowing the degree of implication by an MS/MS spectrum of each region of a database ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
The Paragon TM Algorithm, a novel database search engine for the identification of peptides from tandem mass spectrometry data, is presented. Sequence Temperature Values are computed using a sequence tag algorithm, allowing the degree of implication by an MS/MS spectrum of each region of a database to be determined on a continuum. Counter to conventional approaches, features such as modifications, substitutions, and cleavage events are modeled with probabilities rather than by discrete user-controlled settings to consider or not consider a feature. The use of feature probabilities in conjunction with Sequence Temperature Values allows for a very large increase in the effective search space with only a very small increase in the actual number of hypotheses that must be scored. The algorithm has a new kind of user
Protein identification by mass spectrometry: issues to be considered
- Mol. Cell. Proteomics
, 2004
"... During the past two decades, mass spectrometry has become established as the primary method for protein identification from complex mixtures of biological origin. This is largely attributable to the fortunate coincidence of instrumental advances that allow routine analysis of minute amounts (typical ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
During the past two decades, mass spectrometry has become established as the primary method for protein identification from complex mixtures of biological origin. This is largely attributable to the fortunate coincidence of instrumental advances that allow routine analysis of minute amounts (typically femtomoles) of involatile, polar compounds such as peptides in complex mixtures, with the rapid growth in genomic databases that are amenable to searching with mass spectrometry (MS) 1 data. Like many other developing fields in science, the creation of techniques and software tools and the initial generation and interpretation of data have been the domain of experts, people who are cognizant not only of the benefits of the methods but also of their actual and potential weaknesses. Now, as mass spectrometric techniques and proteomic tools become increasingly available and accessible,
De novo peptide sequencing and identification with precision mass spectrometry
- J. Proteome Res
, 2007
"... The recent proliferation of novel mass spectrometers such as Fourier transform, QTOF, and OrbiTrap marks a transition into the era of precision mass spectrometry, providing a 2 orders of magnitude boost to the mass resolution, as compared to low-precision ion-trap detectors. We investigate peptide d ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
The recent proliferation of novel mass spectrometers such as Fourier transform, QTOF, and OrbiTrap marks a transition into the era of precision mass spectrometry, providing a 2 orders of magnitude boost to the mass resolution, as compared to low-precision ion-trap detectors. We investigate peptide de novo sequencing by precision mass spectrometry and explore some of the differences when compared to analysis of low-precision data. We demonstrate how the dramatically improved performance of de novo sequencing with precision mass spectrometry paves the way for novel approaches to peptide identification that are based on direct sequence lookups, rather than comparisons of spectra to a database. With the direct sequence lookup, it is not only possible to search a database very efficiently, but also to use the database in novel ways, such as searching for products of alternative splicing or products of fusion proteins in cancer. Our de novo sequencing software is available for download at
Constrained De Novo Peptide Identification via Multi-objective Optimization
, 2004
"... Automatic de novo peptide identification from collision-induced dissociation tandem mass spectrometry data is made difficult by large plateaus in the fitness landscapes of scoring functions and the fuzzy nature of the constraints that is due to noise in the data. Two different scoring functio ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Automatic de novo peptide identification from collision-induced dissociation tandem mass spectrometry data is made difficult by large plateaus in the fitness landscapes of scoring functions and the fuzzy nature of the constraints that is due to noise in the data. Two different scoring functions are combined into a parallel multi-objective optimization framework. 1. Peptide identification High-throughput proteomic techniques seek to characterize the state of the proteome in a cell population. A typical procedure may involve extracting cellular proteins followed by tryptic digestion and then separating the peptides with liquid chromatography. The separated peptides are then identified by tandem mass spectrometry (MS/MS). Ideally, peptides will subsequently be quantitated, post-translational modifications will be determined and the information regarding the peptides will be assembled into a picture of the proteomic state of a cell population. Accurate identification of peptides is critical for drawing biologically meaningful conclusions. For this reason, there has been much work recently on developing peptide identification methods for MS/MS spectra. This area of research has proceeded on two fronts, the first of which seeks to take advantage of the wide availability of genome sequences. The database search methods try to identify the peptide that resulted in the observed MS/MS spectrum by picking the best candidate from a list of peptides generated from the genome sequence (e.g. Eng et. al. [8] and Perkins et. al. [22]). De novo methods on the other hand, seek to identify a peptide simply from the observed MS/MS spectrum (e.g. Dan#k et. al. [6], Fernandez-de-Cossio et. al. [10], Jarman and Cannon [3,13] and Heredia- Lang...
Generating Peptide Candidates from Amino-Acid Sequence Databases for Protein Identification via Mass Spectrometry
- In Proceedings of the Second International Workshop on Algorithms in Bioinformatics
, 2002
"... Protein identification via mass spectrometry forms the foundation of high-throughput proteomics. Tandem mass spectrometry, when applied to a complex mixture of peptides, selects and fragments each peptide to reveal its amino-acid sequence structure. The successful analysis of such an experiment typi ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Protein identification via mass spectrometry forms the foundation of high-throughput proteomics. Tandem mass spectrometry, when applied to a complex mixture of peptides, selects and fragments each peptide to reveal its amino-acid sequence structure. The successful analysis of such an experiment typically relies on amino-acid sequence databases to provide a set of biologically relevant peptides to examine. A key subproblem, then, for amino-acid sequence database search engines that analyze tandem mass spectra is to efficiently generate all the peptide candidates from a sequence database with mass equal to one of a large set of observed peptide masses. We demonstrate that to solve the problem efficiently, we must deal with substring redundancy in the amino-acid sequence database and focus our attention on looking up the observed peptide masses quickly. We show that it is possible, with some preprocessing and memory overhead, to solve the peptide candidate generation problem in time asymptotically proportional to the size of the sequence database and the number of peptide candidates output.
Getting more from less—algorithms for rapid protein identification with multiple short peptide sequences
- Mol. Cell Proteomics
, 2002
"... We describe two novel sequence similarity search algorithms, FASTS and FASTF, that use multiple short peptide sequences to identify homologous sequences in protein or DNA databases. FASTS searches with peptide sequences of unknown order, as obtained by mass spectrometry-based sequencing, evaluating ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
We describe two novel sequence similarity search algorithms, FASTS and FASTF, that use multiple short peptide sequences to identify homologous sequences in protein or DNA databases. FASTS searches with peptide sequences of unknown order, as obtained by mass spectrometry-based sequencing, evaluating all possible arrangements of the peptides. FASTF searches with mixed peptide sequences, as generated by Edman sequencing of unseparated mixtures of peptides. FASTF deconvolutes the mixture, using a greedy heuristic that allows rapid identification of high scoring alignments while reducing the total number of explored alternatives. Both algorithms use the heuristic FASTA comparison strategy to accelerate the search but use alignment probability, rather than similarity score, as the criterion for alignment optimality.

