Results 1 - 10
of
12
An Introduction to Bioinformatics Algorithms
, 2004
"... In the early 1990s when one of us was teaching his first bioinformatics class, he was not sure that there would be enough students to teach. Although ..."
Abstract
-
Cited by 34 (0 self)
- Add to MetaCart
In the early 1990s when one of us was teaching his first bioinformatics class, he was not sure that there would be enough students to teach. Although
Mutation-Tolerant Protein Identification by Mass Spectrometry
, 2000
"... Database search in tandem mass spectrometry is a powerful tool for protein identification. High-throughput spectral acquisition raises the problem of dealing with genetic variation and peptide modifications within a population of related proteins. A method that cross-correlates and clusters related ..."
Abstract
-
Cited by 31 (5 self)
- Add to MetaCart
Database search in tandem mass spectrometry is a powerful tool for protein identification. High-throughput spectral acquisition raises the problem of dealing with genetic variation and peptide modifications within a population of related proteins. A method that cross-correlates and clusters related spectra in large collections of uncharacterized spectra (i.e., from normal and diseased individuals) would be very valuable in functional proteomics. This problem is far from being simple since very similar peptides may have very different spectra. We introduce a new notion of spectral similarity that allows one to identify related spectra even if the corresponding peptides have multiple modifications/mutations. Based on this notion, we developed a new algorithm for mutation-tolerant database search as well as a method for cross-correlating related uncharacterized spectra.
SPIDER: Software for Protein identification from Sequence Tags with De Novo Sequencing Error
- J Bioinform Comput Biol
, 2004
"... For the identification of novel proteins using MS/MS, de novo sequencing software computes one or several possible amino acid sequences (called sequence tags) for each MS/MS spectrum. Those tags are then used to match, accounting amino acid mutations, the sequences in a protein database. If the de n ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
For the identification of novel proteins using MS/MS, de novo sequencing software computes one or several possible amino acid sequences (called sequence tags) for each MS/MS spectrum. Those tags are then used to match, accounting amino acid mutations, the sequences in a protein database. If the de novo sequencing gives correct tags, the homologs of the proteins can be identified by this approach and software such as MS-BLAST is available for the matching. However, de novo sequencing very often gives only partially correct tags. The most common error is that a segment of amino acids is replaced by another segment with approximately the same masses. We developed a new efficient algorithm to match sequence tags with errors to database sequences for the purpose of protein and peptide identification. A software package, SPIDER, was developed and made available on Internet for free public use. This paper describes the algorithms and features of the SPIDER software.
Efficiency of database search for identification of mutated and modified proteins via mass spectrometry
- GENOME RES
, 2001
"... ..."
ADVANCEMENT IN PROTEIN INFERENCE FROM SHOTGUN PROTEOMICS USING PEPTIDE DETECTABILITY
"... A major challenge in shotgun proteomics has been the assignment of identified peptides to the proteins from which they originate, referred to as the protein inference problem. Redundant and homologous protein sequences present a challenge in being correctly identified, as a set of peptides may in ma ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
A major challenge in shotgun proteomics has been the assignment of identified peptides to the proteins from which they originate, referred to as the protein inference problem. Redundant and homologous protein sequences present a challenge in being correctly identified, as a set of peptides may in many cases represent multiple proteins. One simple solution to this problem is the assignment of the smallest number of proteins that explains the identified peptides. However, it is not certain that a natural system should be accurately represented using this minimalist approach. In this paper, we propose a reformulation of the protein inference problem by utilizing the recently introduced concept of peptide detectability. We also propose a heuristic algorithm to solve this problem and evaluate its performance on synthetic and real proteomics data. In comparison to a greedy implementation of the minimum protein set algorithm, our solution that incorporates peptide detectability performs favorably. 1.
Comparison of Spectra in Unsequenced Species
"... Abstract. We introduce a new algorithm for the mass spectrometric identification of proteins. Experimental spectra obtained by tandem MS/MS are directly compared to theoretical spectra generated from proteins of evolutionarily closely related organisms. This work is motivated by the need of a method ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract. We introduce a new algorithm for the mass spectrometric identification of proteins. Experimental spectra obtained by tandem MS/MS are directly compared to theoretical spectra generated from proteins of evolutionarily closely related organisms. This work is motivated by the need of a method that allows the identification of proteins of unsequenced species against a database containing proteins of related organisms. The idea is that matching spectra of unknown peptides to very similar MS/MS spectra generated from this database of annotated proteins can lead to annotate unknown proteins. This process is similar to ortholog annotation in protein sequence databases. The difficulty with such an approach is that two similar peptides, even with just one modification (i.e. insertion, deletion or substitution of one or several amino acid(s)) between them, usually generate very dissimilar spectra. In this paper, we present a new dynamic programming based algorithm: Packet-SpectralAlignment. Our algorithm is tolerant to modifications and fully exploits two important properties that are usually not considered: the notion of inner symmetry, a relation linking pairs of spectrum peaks, and the notion of packet inside each spectrum to keep related peaks together. Our algorithm, PacketSpectralAlignment is then compared to SpectralAlignment [1] on a dataset of simulated spectra. Our tests show that PacketSpectralAlignment behaves better, in terms of results and execution time. 1
The RESID Database of protein structure modifications and the NRL-3D Sequence--Structure Database
- Nucleic Acids Res
, 2001
"... The RESID Database is a comprehensive collection of annotations and structures for protein post-translational modifications including N-terminal, C-terminal and peptide chain cross-link modifications. The RESID Database includes systematic and frequently observed alternate names, Chemical Abstracts ..."
Abstract
- Add to MetaCart
The RESID Database is a comprehensive collection of annotations and structures for protein post-translational modifications including N-terminal, C-terminal and peptide chain cross-link modifications. The RESID Database includes systematic and frequently observed alternate names, Chemical Abstracts Service registry numbers, atomic formulas and weights, enzyme activities, taxonomic range, keywords, literature citations with database crossreferences, structural diagrams and molecular models. The NRL-3D Sequence--Structure Database is derived from the three-dimensional structure of proteins deposited with the Research Collaboratory for Structural Bioinformatics Protein Data Bank. The NRL-3D Database includes standardized and frequently observed alternate names, sources, keywords, literature citations, experimental conditions and searchable sequences from model coordinates. These databases are freely accessible through the National Cancer Institute--Frederick Advanced Biomedical Computing Center at these web sites: http://www.ncifcrf.gov/RESID, http://www.ncifcrf.gov/ NRL-3D; or at these National Biomedical Research Foundation Protein Information Resource web sites: http://pir.georgetown.edu/pirwww/dbinfo/resid.html, http://pir.georgetown.edu/pirwww/dbinfo/nrl3d.html
Giardia lamblia and Trichomonas vaginalis
, 2008
"... resources for the eukaryotic protist pathogens ..."

