Results 1 -
8 of
8
Mutation-Tolerant Protein Identification by Mass Spectrometry
, 2000
"... Database search in tandem mass spectrometry is a powerful tool for protein identification. High-throughput spectral acquisition raises the problem of dealing with genetic variation and peptide modifications within a population of related proteins. A method that cross-correlates and clusters related ..."
Abstract
-
Cited by 31 (5 self)
- Add to MetaCart
Database search in tandem mass spectrometry is a powerful tool for protein identification. High-throughput spectral acquisition raises the problem of dealing with genetic variation and peptide modifications within a population of related proteins. A method that cross-correlates and clusters related spectra in large collections of uncharacterized spectra (i.e., from normal and diseased individuals) would be very valuable in functional proteomics. This problem is far from being simple since very similar peptides may have very different spectra. We introduce a new notion of spectral similarity that allows one to identify related spectra even if the corresponding peptides have multiple modifications/mutations. Based on this notion, we developed a new algorithm for mutation-tolerant database search as well as a method for cross-correlating related uncharacterized spectra.
Efficiency of database search for identification of mutated and modified proteins via mass spectrometry
- GENOME RES
, 2001
"... ..."
Algorithmic complexity of protein identification: Combinatorics of weighted strings
- DISCRETE APPLIED MATHEMATICS, SPECIAL ISSUE ON COMBINATORICS OF SEARCHING, SORTING, AND CODING. (2002)
, 2004
"... We investigate a problem from computational biology: Given a constant size alphabet M with a weight function / : M--> +, find an efficient data structure and query algorithm solving the following problem: For a weight M C + and a string cr over A, decide whether cr contains a substring with weight M ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
We investigate a problem from computational biology: Given a constant size alphabet M with a weight function / : M--> +, find an efficient data structure and query algorithm solving the following problem: For a weight M C + and a string cr over A, decide whether cr contains a substring with weight M (ONE STRING MASS FINDING PROBLEM). If the answer is yes, then we may in addition require a witness, i.e. indices i _ i and ending at position j has weight M. We allow preprocessing of the string, and measure efficiency in two parameters: storage space required for the preprocessed data, and running time of the query algorithm for given M. We are interested in data structures and algorithms requiring subquadratic storage space and sublinear query time, where we measure the input size as the length of the input string. We present two efficient algorithms: LOOKUP solves the problem with O(,) space and (Wg ' loglog,) time; INTERVAL solves the problem for binary alphabets with O0, ) space in O(log,) time. We sketch a third al-gorithm, CLUSTER, which can be adjusted for a space time tradeoff but for which we do not yet have a resource analysis. We introduce a function on weighted strings which is closely related to the analysis of algorithms for the ONE STRING MASS FINDING PROBLEM: The number of different submasses of a weighted string. We present several properties of this function, including upper and lower bounds. Finally, we introduce two more general variants of the problem and sketch how algorithms may be extended for these variants.
Advances and challenges in liquid chromatography-mass spectrometry-based proteomics profiling for clinical applications
- Mol Cell Proteomics
, 2006
"... Recent advances in proteomics technologies provide tremendous opportunities for biomarker-related clinical applications; however, the distinctive characteristics of human biofluids such as the high dynamic range in protein abundances and extreme complexity of the proteomes present tremendous challen ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Recent advances in proteomics technologies provide tremendous opportunities for biomarker-related clinical applications; however, the distinctive characteristics of human biofluids such as the high dynamic range in protein abundances and extreme complexity of the proteomes present tremendous challenges. In this review we summarize recent advances in LC-MS-based proteomics profiling and its applications in clinical proteomics as well as discuss the major challenges associated with implementing these technologies for more effective candidate biomarker discovery. Developments in immunoaffinity depletion and various fractionation approaches in combination with substantial improvements in LC-MS platforms have enabled the plasma proteome to be profiled with considerably greater dynamic range of coverage,
Probabilistic Arithmetic Automata Applications of a Stochastic Computational Framework in Biological Sequence Analysis
"... The immense amount of biological sequence data available these days requires efficient and sensitive analysis in order to provide e.g. the identification of unknown proteins, or information about the similarity between DNA sequences. Furthermore, new challenges to computational sequence analysis are ..."
Abstract
- Add to MetaCart
The immense amount of biological sequence data available these days requires efficient and sensitive analysis in order to provide e.g. the identification of unknown proteins, or information about the similarity between DNA sequences. Furthermore, new challenges to computational sequence analysis are posed by short sequence reads resulting from modern high throughput sequencing technologies such as 454 or Solexa/Illumina. Viewing biological sequences, such as DNA and proteins, as strings allows their investigation under a generative random string model. That is to say, one can define a probabilistic null model that generates random strings as representatives of a class of sequences. From these, one can deduce general statistical properties. In this thesis, we give a thorough derivation of a probabilistic model, called probabilistic arithmetic automaton (PAA). This models sequences of operations associated to operands depending on chance and provides the computational framework to calculate the exact distribution of the value resulting from those operations. For
Overlapping MS/MS spectra and disease proteomics
"... The ongoing success of the proteomics endeavor is the result of a prolific symbiosis between experimental ingenuity [2, 3, 4] and efficient bioinformatics [5, 6, 7, 8, 9, 10, 11]. Without these, ground-breaking landmarks such as the human genome project [12, 13] or the HUPO initiative [14] would lik ..."
Abstract
- Add to MetaCart
The ongoing success of the proteomics endeavor is the result of a prolific symbiosis between experimental ingenuity [2, 3, 4] and efficient bioinformatics [5, 6, 7, 8, 9, 10, 11]. Without these, ground-breaking landmarks such as the human genome project [12, 13] or the HUPO initiative [14] would likely not have seen the light of day. But despite valuable contributions, the road to a better understanding of disease proteomics is still hurdled by significant difficulties in the extensive identification of post-translational modifications and in the sequencing of novel proteins like cancer fusion proteins or antibody chains. Recently, tandem mass spectrometry (MS/MS) based approaches seemed to be reaching the limit on the amount of information that could be extracted from MS/MS spectra [15, 16, 17]. However, a closer look reveals that a common limiting procedure is to analyze each spectrum in isolation, even though high throughput mass spectrometry regularly generates many spectra from related peptides. By capitalizing on this redundancy we have shown that, similarly to the alignment of protein sequences [5], unidentified MS/MS spectra can also be aligned for the identification of modified and unmodified variants of the same peptide. Moreover, this alignment procedure can be iterated for the accurate grouping of multiple peptide variants (Figure 1). The highly correlated peaks in spectra from variants of the same peptide allowed us to reliably identify all known and even some unknown modifications in a sample of cataractous lenses

