Results 1 - 10
of
12
BAliBASE 3.0: latest developments of the multiple sequence alignment benchmark
- Proteins
, 2005
"... ABSTRACT Multiple sequence alignment is one of the cornerstones of modern molecular biology. It is used to identify conserved motifs, to determine protein domains, in 2D/3D structure prediction by homology and in evolutionary studies. Recently, high-throughput technologies such as genome sequencing ..."
Abstract
-
Cited by 48 (1 self)
- Add to MetaCart
ABSTRACT Multiple sequence alignment is one of the cornerstones of modern molecular biology. It is used to identify conserved motifs, to determine protein domains, in 2D/3D structure prediction by homology and in evolutionary studies. Recently, high-throughput technologies such as genome sequencing and structural proteomics have lead to an explosion in the amount of sequence and structure information available. In response, several new multiple alignment methods have been developed that improve both the efficiency and the quality of protein alignments. Consequently, the benchmarks used to evaluate and compare these methods must also evolve. We present here the latest release of the most widely used multiple alignment benchmark, BAliBASE, which provides high quality, manually refined, reference alignments based on 3D structural superpositions. Version 3.0 of BAliBASE includes new, more challenging test cases, representing the real problems encountered when aligning large sets of complex sequences. Using a novel, semiautomatic update protocol, the number of protein families in the benchmark has been increased and representative test cases are now available that cover most of the protein fold space. The total number of proteins in BAliBASE has also been significantly increased from 1444 to 6255 sequences. In addition, full-length sequences are now provided for all test cases, which represent difficult cases for both global and local alignment programs. Finally, the BAliBASE Web site
3DCoffee: Combining protein sequences and structures within multiple sequence alignments
- J Mol Biol
, 2004
"... It has long been assumed that using structural information can increase the accuracy of multiple protein sequence alignments (MSA). 1 Recent results 2,3 suggest that accurate MSAs obtained this ..."
Abstract
-
Cited by 29 (5 self)
- Add to MetaCart
It has long been assumed that using structural information can increase the accuracy of multiple protein sequence alignments (MSA). 1 Recent results 2,3 suggest that accurate MSAs obtained this
The ProDom database of protein domain families: more emphasis on 3D
- Nucleic Acids Res
, 2005
"... doi:10.1093/nar/gki034 ..."
B: Sequence and comparative genomic analysis of actin-related proteins
- Mol Biol Cell
"... Actin-related proteins (ARPs) are key players in cytoskeleton activities and nuclear functions. Two complexes, ARP2/3 and ARP1/11, also known as dynactin, are implicated in actin dynamics and in microtubule-based trafficking, respectively. ARP4 to ARP9 are components of many chromatin-modulating com ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Actin-related proteins (ARPs) are key players in cytoskeleton activities and nuclear functions. Two complexes, ARP2/3 and ARP1/11, also known as dynactin, are implicated in actin dynamics and in microtubule-based trafficking, respectively. ARP4 to ARP9 are components of many chromatin-modulating complexes. Conventional actins and ARPs codefine a large family of homologous proteins, the actin superfamily, with a tertiary structure known as the actin fold. Because ARPs and actin share high sequence conservation, clear family definition requires distinct features to easily and systematically identify each subfamily. In this study we performed an in depth sequence and comparative genomic analysis of ARP subfamilies. A high-quality multiple alignment of �700 complete protein sequences homologous to actin, including 148 ARP sequences, allowed us to extend the ARP classification to new organisms. Sequence alignments revealed conserved residues, motifs, and inserted sequence signatures to define each ARP subfamily. These discriminative characteristics allowed us to develop ARPAnno
IMGT standardization and analysis of V-LIKE-, C-LIKE and G-LIKE-DOMAINs
, 2003
"... Introduction IMGT, the international ImMunoGeneTics information system (http://imgt.cines.fr) [1], was created in 1989 at Montpellier. IMGT specializes in immunoglobulins (IG), T cell receptors (TR) and major histocompatibility complex (MHC) and related proteins of the immune system (RPI) from huma ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Introduction IMGT, the international ImMunoGeneTics information system (http://imgt.cines.fr) [1], was created in 1989 at Montpellier. IMGT specializes in immunoglobulins (IG), T cell receptors (TR) and major histocompatibility complex (MHC) and related proteins of the immune system (RPI) from human and other vertebrate species. Three structural and functional domains have been identified for the IG and TR chains (Variable V-DOMAIN or Constant C-DOMAIN) and for the MHC chains (Groove G-DOMAIN). IMGT-ONTOLOGY [2] provides a standardized description of these domains. An IMGT unique numbering was established for the V- and CDOMAIN, whatever the receptor (IG or TR), the chain type and the species [3]; more recently, an IMGT unique numbering was established for the G-DOMAIN. In the past few years, domains of similar folds have been found in proteins other than IG, TR and MHC which are involved in a variety of biological processes and which interact with a range of different ligands; these
A Simple Method to Predict Protein Binding From Aligned Sequences
, 2005
"... Motivation: The MHC superfamily (MhcSF) consists of immune system MHC class I (MHC-I) proteins, along with proteins with a MHC-I-like structure that are involved in a large variety of biological processes. Beta2-microglobulin (B2M) noncovalent binding to MHCI proteins is required for their surface e ..."
Abstract
- Add to MetaCart
Motivation: The MHC superfamily (MhcSF) consists of immune system MHC class I (MHC-I) proteins, along with proteins with a MHC-I-like structure that are involved in a large variety of biological processes. Beta2-microglobulin (B2M) noncovalent binding to MHCI proteins is required for their surface expression and function, while MHC-I-like proteins interact, or not, with B2M. This study was designed to predict B2M binding (or non-binding) of newly identified MhcSF proteins, in order to decipher their function, understand the molecular recognition mechanisms, and identify deleterious mutations. IMGT standardization of MhcSF protein domains provides a unique numbering of the multiple alignment positions, and conditions to develop such predictive tool.
Protein Multiple Sequence Alignment by Hybrid Immunological Algorithms
"... Abstract. This paper presents an immune inspired algorithm, to tackle and optimize the multiple sequence alignment (MSA) problem. MSA is one of the most important tasks in biological sequence analysis. Although this paper focuses on protein alignments, most of the discussion and methodology may be a ..."
Abstract
- Add to MetaCart
Abstract. This paper presents an immune inspired algorithm, to tackle and optimize the multiple sequence alignment (MSA) problem. MSA is one of the most important tasks in biological sequence analysis. Although this paper focuses on protein alignments, most of the discussion and methodology may be also applied to DNA alignments. The presented algorithm, called IMSA, incorporates two new strategies to create the initial population, and specific ad-hoc mutation operators. It is based on the classical weighted sum of pairs as objective function, to evaluate a given candidate alignments. IMSA was tested using both classical benchmarks of BAliBASE (versions 1.0 and 2.0.), and experimental results indicate that it is comparable with state-of-art MSA, in terms of quality of alignments, SP and CS score values. The main novelty of IMSA is the ability of generating more than a single sub-optimal alignment, for every MSA instance; this behaviour is due to the stochastic nature of the algorithm and of the populations evolved during the convergence process. This feature will help the decision maker to assess and select the biologically relevant multiple sequence alignment.
REVIEW Strategies for Reliable Exploitation of Evolutionary Concepts in High Throughput Biology
"... Abstract: The recent availability of the complete genome sequences of a large number of model organisms, together with the immense amount of data being produced by the new high-throughput technologies, means that we can now begin comparative analyses to understand the mechanisms involved in the evol ..."
Abstract
- Add to MetaCart
Abstract: The recent availability of the complete genome sequences of a large number of model organisms, together with the immense amount of data being produced by the new high-throughput technologies, means that we can now begin comparative analyses to understand the mechanisms involved in the evolution of the genome and their consequences in the study of biological systems. Phylogenetic approaches provide a unique conceptual framework for performing comparative analyses of all this data, for propagating information between different systems and for predicting or inferring new knowledge. As a result, phylogeny-based inference systems are now playing an increasingly important role in most areas of high throughput genomics, including studies of promoters (phylogenetic footprinting), interactomes (based on the presence and degree of conservation of interacting proteins), and in comparisons of transcriptomes or proteomes (phylogenetic proximity and co-regulation/co-expression). Here we review the recent developments aimed at making automatic, reliable phylogenybased inference feasible in large-scale projects. We also discuss how evolutionary concepts and phylogeny-based inference strategies are now being exploited in order to understand the evolution and function of biological systems. Such advances will be fundamental for the success of the emerging disciplines of systems biology and synthetic biology, and will have wide-reaching effects in applied fields such as biotechnology, medicine and pharmacology.
Assessing the Discordance of Multiple Sequence Alignments
"... Abstract—Multiple sequence alignments have wide applicability in many areas of computational biology, including comparative genomics, functional annotation of proteins, gene finding, and modeling evolutionary processes. Because of the computational difficulty of multiple sequence alignment and the a ..."
Abstract
- Add to MetaCart
Abstract—Multiple sequence alignments have wide applicability in many areas of computational biology, including comparative genomics, functional annotation of proteins, gene finding, and modeling evolutionary processes. Because of the computational difficulty of multiple sequence alignment and the availability of numerous tools, it is critical to be able to assess the reliability of multiple alignments. We present a tool called StatSigMA to assess whether multiple alignments of nucleotide or amino acid sequences are contaminated with one or more unrelated sequences. There are numerous applications for which StatSigMA can be used. Two such applications are to distinguish homologous sequences from nonhomologous ones and to compare alignments produced by various multiple alignment tools. We present examples of both types of applications. Index Terms—Multiple sequence alignment, discordance, alignment accuracy, Karlin-Altschul statistics, biology and genetics, life and medical sciences, computer applications. Ç 1

