Results 1 - 10
of
52
Molecular Fossils in the Human Genome: Identification and analysis of the pseudogenes in chromosomes 21 and 22
, 2001
"... We have developed an initial approach for annotating and surveying pseudogenes in the human genome. We search human genomic DNA for regions that are similar to known protein sequences and contain obvious disablements (i.e. mid-sequence stop codohs or frameshifts), while insuring minimal overlap with ..."
Abstract
-
Cited by 36 (20 self)
- Add to MetaCart
We have developed an initial approach for annotating and surveying pseudogenes in the human genome. We search human genomic DNA for regions that are similar to known protein sequences and contain obvious disablements (i.e. mid-sequence stop codohs or frameshifts), while insuring minimal overlap with annotations of known genes. Pseudogenes can be divided into 'processed' and 'non-processed'; the former are reverse- transcribed from mRNA (and therefore have no intron structure) whereas the latter presumably arise from genomic duplications. We annotate putative processed pseudogenes based on whether there is a continuous span of homology that is >70% of the length of the closest matching human protein (i.e. with introns removed), or whether there is evidence of polyadenylation. We have applied our approach to chromosomes 21 and 22, the first parts of the human genome completely sequenced, finding 190 new pseudogene annotations beyond the 264 reported by the sequencing centres. In total, on chromosomes 21 and 22, there are 189 processed pseudogenes, 195 non-processed pseudogenes and, additionally, 70 pseudogenic immunoglobulin gene segments. (Detailed assignments are available at http://bioinfo.mbb.yale.edu/genome/pseudogene.) By extrapolation, we predict that there could be up to-20,000 pseudogenes in the whole human genome, with a little more than half of them processed. We have determined the main populations and clusters of pseudogenes on chromosomes 21 and 22. There are notable excesses of pseudogenes relative to genes near the centromeres of both chromosomes, suggesting the existence of pseudogenic 'hot-spots' in the genome. We have looked at the distribution of InterPro families and GO functional categories in our pseudogenes. Overall, the families in both processed ...
The alternative splicing gallery (ASG): bridging the gap between genome and transcriptome
- Nucleic Acids Res
, 2004
"... Alternative splicing essentially increases the diversity of the transcriptome and has important implications for physiology, development and the genesis of diseases. Conventionally, alternative splicing is investigated in a case-by-case fashion, but this becomes cumbersome and error prone if genes s ..."
Abstract
-
Cited by 21 (0 self)
- Add to MetaCart
Alternative splicing essentially increases the diversity of the transcriptome and has important implications for physiology, development and the genesis of diseases. Conventionally, alternative splicing is investigated in a case-by-case fashion, but this becomes cumbersome and error prone if genes show a huge abundance of different splice variants. We use a different approach and integrate all transcripts derived from a gene into a single splicing graph. Each transcript corresponds to a path in the graph, and alternative splicing is displayed by bifurcations. This representation preserves the relationships between different splicing variants and allows us to investigate systematically all possible putative transcripts. We built a database of splicing graphs for human genes, using transcript information from various major sources (Ensembl, RefSeq, STACK, TIGR and UniGene). A Web interface allows users to display the splicing graphs, to interactively assemble transcripts and to access their sequences as well as neighboring genomic regions. We also provide for each gene an exhaustive pre-computed catalog of putative transcripts—in total more than 1.2 million sequences. We found that 65 % of the investigated genes show evidence for alternative splicing, and in 5 % of the cases, a single gene might produce over 100 transcripts.
Computational analysis and experimental validation of tumor-associated alternative RNA splicing in human cancer
- Cancer Res
, 2003
"... A genome-wide computational screen was performed to identify tumorassociated alternative RNA splicing isoforms. A BLAST algorithm was used to compare 11,014 genes from RefSeq with 3,471,822 human expressed sequence tag sequences. The screen identified 26,258 alternative splicing isoforms of which 84 ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
A genome-wide computational screen was performed to identify tumorassociated alternative RNA splicing isoforms. A BLAST algorithm was used to compare 11,014 genes from RefSeq with 3,471,822 human expressed sequence tag sequences. The screen identified 26,258 alternative splicing isoforms of which 845 were significantly associated with human cancer, and 54 were specifically associated with liver cancer. Furthermore, canonical GT-AG splice junctions were used significantly less frequently in the alternative splicing isoforms in tumors. Reverse transcription-PCR experiments confirmed association of the alternative splicing isoforms with tumors. These results suggest that alternative splicing may have potential as a diagnostic marker for cancer.
A question of size: the eukaryotic proteome and the problems in defining it
- Nucl. Acids. Res
, 2002
"... We discuss the problems in defining the extent of the proteomes for completely sequenced eukaryotic organisms (i.e. the total number of protein-coding sequences), focusing on yeast, worm, fly and human. (i) Six years after completion of its genome sequence, the true size of the yeast proteome is sti ..."
Abstract
-
Cited by 16 (9 self)
- Add to MetaCart
We discuss the problems in defining the extent of the proteomes for completely sequenced eukaryotic organisms (i.e. the total number of protein-coding sequences), focusing on yeast, worm, fly and human. (i) Six years after completion of its genome sequence, the true size of the yeast proteome is still not defined. New small genes are still being discovered, and a large number of existing annotations are being called into question, with these questionable ORFs (qORFs) comprising up to a fifth of the ‘current ’ proteome. We discuss these in context of an ideal genome-annotation strategy that considers the proteome as a rigorously defined subset of all possible coding sequences (‘the orfome’). (ii) Despite the greater apparent complexity of the fly (more cells, more complex physiology, longer lifespan), the nematode worm appears to have more genes. To explain this, we compare the annotated proteomes of worm and fly, relating to both genomeannotation and genome evolution issues. (iii) The unexpectedly small size of the gene complement estimated for the complete human genome provoked much public debate about the nature of biological complexity. However, in the first instance, for the human genome the relationship between gene numberandproteomesizeisfarfromsimple.We survey the current estimates for the numbers of human genes and, from this, we estimate the range in the size of the human proteome. The determination of this is substantially hampered by the unknown extent of the cohort of pseudogenes (‘dead ’ genes), in combination with the prevalence of alternative splicing. (Further information relating to yeast is available at
Large Scale Study of Protein Domain Distribution in the Context of Alternative Splicing
- Nucleic Acids Res
, 2003
"... Alternative splicing plays an important role in processes such as development, differentiation and cancer. With the recent increase in the estimates of the number of human genes that undergo alternative splicing from 5 to 3559%, it is becoming critical to develop a better understanding of its functi ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
Alternative splicing plays an important role in processes such as development, differentiation and cancer. With the recent increase in the estimates of the number of human genes that undergo alternative splicing from 5 to 3559%, it is becoming critical to develop a better understanding of its functional consequences and regulatory mechanisms. We conducted a large scale study of the distribution of protein domains in a curated data set of several thousand genes and identied protein domains disproportionately distributed among alternatively spliced genes. We also identied a number of protein domains that tend to be spliced out. Both the proteins having the disproportionately distributed domains as well as those with spliced-out domains are predominantly involved in the processes of cell communication, signaling, development and apoptosis. These proteins function mostly as enzymes, signal transducers and receptors. Somewhat surprisingly, 28% of all occurrences of spliced-out domains are not effected by straightforward exclusion of exons coding for the domains but by inclusion or exclusion of other exons to shift the reading frame while retaining the exons coding for the domains in the nal transcripts.
W: Improved spliced alignment from an information theoretic approach
- Bioinformatics
, 2006
"... The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated. For commercial re-use, please contact journals.permissions@oxfordjournals.org
A simple physical model predicts small exon length variations
- PLoS Genet
, 2006
"... One of the most common splice variations are small exon length variations caused by the use of alternative donor or acceptor splice sites that are in very close proximity on the pre-mRNA. Among these, three-nucleotide variations at socalled NAGNAG tandem acceptor sites have recently attracted consid ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
One of the most common splice variations are small exon length variations caused by the use of alternative donor or acceptor splice sites that are in very close proximity on the pre-mRNA. Among these, three-nucleotide variations at socalled NAGNAG tandem acceptor sites have recently attracted considerable attention, and it has been suggested that these variations are regulated and serve to fine-tune protein forms by the addition or removal of a single amino acid. In this paper we first show that in-frame exon length variations are generally overrepresented and that this overrepresentation can be quantitatively explained by the effect of nonsense-mediated decay. Our analysis allows us to estimate that about 50 % of frame-shifted coding transcripts are targeted by nonsense-mediated decay. Second, we show that a simple physical model that assumes that the splicing machinery stochastically binds to nearby splice sites in proportion to the affinities of the sites correctly predicts the relative abundances of different small length variations at both boundaries. Finally, using the same simple physical model, we show that for NAGNAG sites, the difference in affinities of the neighboring sites for the splicing machinery accurately predicts whether splicing will occur only at the first site, splicing will occur only at the second site, or three-nucleotide splice variants are likely to occur. Our analysis thus suggests that small exon length variations are the result of stochastic binding of the spliceosome at neighboring splice sites. Small exon length variations occur when there are nearby alternative splice sites that have similar affinity for the splicing machinery.
Computational Analysis of Alternative Splicing Using EST Tissue Information
, 2002
"... this article are available onISXf (http://www. idealibrary.com) ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
this article are available onISXf (http://www. idealibrary.com)
Large-scale data sharing in the life sciences: Data standards, incentives, barriers and funding models, The Joint Data Standards Study, http://www.mrc.ac.uk/pdfjdss_final_report.pdf
"... Recent studies have demonstrated the value of sharing and re-using data in the life sciences. This study builds on that premise, exploring the practice of sharing data and identifying incentives, facilitators and obstacles to data sharing. In the light of its findings the study presents characterist ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Recent studies have demonstrated the value of sharing and re-using data in the life sciences. This study builds on that premise, exploring the practice of sharing data and identifying incentives, facilitators and obstacles to data sharing. In the light of its findings the study presents characteristics of models which support effective, ethical data sharing, to enable first-class, innovative and productive science, within and across disciplines. The study spanned the full spectrum of the life sciences, and all data types and methods. It examined ten case studies, including two comparators from outside the life sciences. These case studies were supplemented by interviews with key informants and by desk research. Key recommendations arising from the study include: insistence on a data management plan, clearly defined remit and goals, sustained work on the development of vocabularies and ontologies, awareness of the needs of archiving and long-term preservation, gathering user input into tools development programmes, and a code of practice for managing and sharing confidential data.
The Effects of Alternative Splicing on Transmembrane Proteins in the Mouse Genome
- in ‘Proceedings of the 9th Pacific Symposium on Biocomputing’, 6th–9th January
, 2004
"... this paper is available online at http://www.affymetrix.com/community/publications/affymetrix/tmsplice/ ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
this paper is available online at http://www.affymetrix.com/community/publications/affymetrix/tmsplice/

