Results 1 - 10
of
34
Prediction of complete gene structures in human genomic DNA
- J. Mol. Biol
, 1997
"... The problem of identifying genes in genomic DNA sequences by computational methods has attracted considerable research attention in recent years. From one point of view, the problem is closely ..."
Abstract
-
Cited by 511 (7 self)
- Add to MetaCart
The problem of identifying genes in genomic DNA sequences by computational methods has attracted considerable research attention in recent years. From one point of view, the problem is closely
Eukaryotic Promoter Recognition
- Genome Res
, 1997
"... 957> http://gnomic.stanford.edu/~chris/GENSCANW. html). Because the signals that control the start and stop of transcription and translation, and the location of splicing, are still not very well understood, it is not uncommon for a gene-finding algorithm to confuse internal with initial and termina ..."
Abstract
-
Cited by 57 (0 self)
- Add to MetaCart
957> http://gnomic.stanford.edu/~chris/GENSCANW. html). Because the signals that control the start and stop of transcription and translation, and the location of splicing, are still not very well understood, it is not uncommon for a gene-finding algorithm to confuse internal with initial and terminal exons, thus wrongly partitioning the exons. The problem is compounded by our incomplete understanding of alternative splicing control elements. Another line of development in gene identification is based on homology (e.g., Gish and States 1993; Gelfand et al. 1996). If there is a close homolog in the databases to one of the genes in the sequence under analysis, sequence similarity will usually group the exons for this gene correctly. Still, in many cases there is no close homolog and no guarantee when there is some homolog that the encoded protein lacks insertions/deletions. Clearly, some means of recognizing the beginnings of genes, probably via the promoter, or the ends, probabl
Computational Methods for the Identification of Genes in Vertebrate Genomic Sequences
- Hum. Mol. Genet
, 1997
"... Research into new methods to identify genes in anonymous genomic sequences has been going on for more than 15 years. Over this period of time, the field has evolved from the designing of programs to identify protein coding regions in compact mitochondrial or bacterial genomes, to the challenge of pr ..."
Abstract
-
Cited by 36 (3 self)
- Add to MetaCart
Research into new methods to identify genes in anonymous genomic sequences has been going on for more than 15 years. Over this period of time, the field has evolved from the designing of programs to identify protein coding regions in compact mitochondrial or bacterial genomes, to the challenge of predicting the detailed organization of multi-exon vertebrate genes. The best program currently available perfectly locates more than 80 % of the internal coding exons, and only 5 % of the predictions do not overlap a real exon. Given such accuracy, computational methods are indeed very useful; however, they do not alleviate the need for experimental validation. If the performances are satisfactory for the identification of the coding moiety of genes (internal coding exons), the determination of the full extent of the transcript (5 ′ and 3 ′ extremities of the gene) and the location of promoter regions are still unreliable. As the human and mouse genome sequencing projects enter a production mode, the fully automated annotation of megabase-long anonymous genomic sequences is the next big challenge in bioinformatics.
The Biology of Eukaryotic Promoter Prediction -- a Review
- COMPUT. CHEM
, 1999
"... Computational prediction of eukaryotic promoters from the nucleotide sequence is one of the most attractive problems in sequence analysis today, but it is also a very difficult one. Thus, current methods predict in the order of one promoter per kilobase in human DNA, while the average distance betwe ..."
Abstract
-
Cited by 26 (0 self)
- Add to MetaCart
Computational prediction of eukaryotic promoters from the nucleotide sequence is one of the most attractive problems in sequence analysis today, but it is also a very difficult one. Thus, current methods predict in the order of one promoter per kilobase in human DNA, while the average distance between functional promoters has been estimated to be in the range of 30-40 kilobases. Although it is conceivable that some of these predicted promoters correspond to cryptic initiation sites that are used in vivo, it is likely that most are false positives. This suggests that it is important to carefully reconsider the biological data that forms the basis of current algorithms, and we here present a review of data that may be useful in this regard. The review covers the following topics: (1) basal transcription and core promoters, (2) activated transcription and transcription factor binding sites, (3) CpG islands and DNA methylation, (4) chromosomal structure and nucleosome modification, and (5) chromosomal domains and domain boundaries. We discuss the possible lessons that may be learned, especially with respect to the wealth of information about epigenetic regulation of transcription that has been appearing in recent years.
Identification of Genes in Human Genomic DNA
, 1997
"... A general probabilistic model of the gene structural and compositional properties of human genomic DNA is introduced and applied to the problem of identifying genes in unannotated human genomic sequences. The model uses a \Hidden semi-Markov" or semi-Markov source architecture which incorporate ..."
Abstract
-
Cited by 23 (1 self)
- Add to MetaCart
A general probabilistic model of the gene structural and compositional properties of human genomic DNA is introduced and applied to the problem of identifying genes in unannotated human genomic sequences. The model uses a \Hidden semi-Markov" or semi-Markov source architecture which incorporates probabilistic descriptions of fundamental transcriptional, translational and splicing signals, as well as length distri-butions and compositional features of exons, introns and intergenic regions. Distinct sets of model parameters are derived which account for many of the substantial di er-ences in gene density and structure observed in distinct C+G compositional regions (\isochores") of the human genome. A novel model building procedure, termed Max-imal Dependence Decomposition, is introduced which captures potentially important dependencies between non-adjacent aswell as adjacent positions in a biological signal. Application of this model to the donor splice signal not only gives better discrimina-tion of potential donor sites than previous probabilistic models, but also reveals subtle properties of this signal which suggest aspects of its biochemical function. Acceptor
Identification of human gene core promoters in silico
- Genome Research
, 1998
"... Identification of the 5’-end of human genes requires identification of functional promoter elements. In silico identification of those elements is difficult because of the hierarchical and modular nature of promoter architecture. To address this problem, I propose a new stepwise strategy based on in ..."
Abstract
-
Cited by 19 (2 self)
- Add to MetaCart
Identification of the 5’-end of human genes requires identification of functional promoter elements. In silico identification of those elements is difficult because of the hierarchical and modular nature of promoter architecture. To address this problem, I propose a new stepwise strategy based on initial localization of a functional promoter into a 1-2 kb (extended-promoter) region from within a large genomic DNA sequence of 100 kb or larger, and further localization of a Transcriptional Start Site (TSS) into a 50-100 bp (core-promoter) region. Using positional dependent 5-tuple measures, a Quadratic Discriminant Analysis (QDA) method has been implemented in a new program- CorePromoter. Our experiments indicate that when given a 1-2 kb extended promoter, CorePromoter will correctly localize the TSS to a 100 bp interval approximately 60 % of the time.
Data Mining for Regulatory Elements in Yeast Genome
, 1997
"... We have examined methods and developed a general software tool for finding and analyzing combinations of transcription factor binding sites that occur relatively often in gene upstream regions (putative promoter regions) in the yeast genome. Such frequently occurring combinations may be essential pa ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
We have examined methods and developed a general software tool for finding and analyzing combinations of transcription factor binding sites that occur relatively often in gene upstream regions (putative promoter regions) in the yeast genome. Such frequently occurring combinations may be essential parts of possible promoter classes. The regions upstream to all genes were first isolated from the yeast genome database MIPS using the information in the annotation files of the database. The ones that do not overlap with coding regions were chosen for further studies. Next, all occurrences of the yeast transcription factor binding sites, as given in the IMD database, were located in the genome and in the selected regions in particular. Finally, by using a general purpose data mining software in combination with our own software, which parametrizes the search, we can find the combinations of binding sites that occur in the upstream regions more frequently than would be expected on the basis o...
Finding Transcription Factor Binding Site Combinations in the Yeast Genome (Extended Abstract)
- In Proceedings of the German Conference on Bioinformatics GCB’97, Kloster Irsee
, 1997
"... ) Alvis Br¯azma EMBL Outstation -- Hinxton, European Bioinformatics Institute Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD brazma@ebi.ac.uk, fax +44 1223 494468, phone +44 1223 494658 Jaak Vilo, Esko Ukkonen Department of Computer Science P.O.Box 26, FIN-00014 University of Helsinki ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
) Alvis Br¯azma EMBL Outstation -- Hinxton, European Bioinformatics Institute Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD brazma@ebi.ac.uk, fax +44 1223 494468, phone +44 1223 494658 Jaak Vilo, Esko Ukkonen Department of Computer Science P.O.Box 26, FIN-00014 University of Helsinki, Finland [vilo,ukkonen]@cs.helsinki.fi The first complete genome of a eukaryotic organism, namely the yeast S.Cerevisiae, has recently been sequenced [3] and is publicly available in MIPS database [5]. This database contains the DNA sequences, as well as information about the positions and putative features of the predicted genes. The genes in a eukaryotic genome have each a particular combination of binding sites (usually specific 5 to 25 nucleic acids long DNA sequences) for transcription factors that activate or repress their transcription. In yeast genome these sites are located normally in the promoter region within several hundreds base pairs upstream from the transcription initiation...
Stochastic Segment Models of Eukaryotic Promoter Regions
- Pac. Symp. Biocomput
, 2000
"... this paper, we present a new approach for the stochastic modeling of eukaryotic polymerase II promoters, based on the general segmental structure of promoter regions. We could show a clear improvement of a five-state segment model on the classification of fixed-length sequences with respect to our p ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
this paper, we present a new approach for the stochastic modeling of eukaryotic polymerase II promoters, based on the general segmental structure of promoter regions. We could show a clear improvement of a five-state segment model on the classification of fixed-length sequences with respect to our previous approach, which modeled the promoter region as a whole. The results on genomic sequences are also improved, but not yet as much as we expected.
Novel Neural Network Prediction Systems for Human Promoters and Splice Sites
- In Gene-Finding and Gene Structure Prediction Workshop
, 1995
"... We present a detailed theoretical study of the organization and structure of landmark sequences like promoters and splice junctions in Human DNA. An improved detection of these landmark sequences in genomic DNA is important for exon detection and gene assembly. The function of eukaryotic promoters a ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
We present a detailed theoretical study of the organization and structure of landmark sequences like promoters and splice junctions in Human DNA. An improved detection of these landmark sequences in genomic DNA is important for exon detection and gene assembly. The function of eukaryotic promoters as initiators for transcription and of splice sites as signals for RNA assembly are among of the most complex processes in molecular biology. Both consist of multiple functional sites in primary DNA that are involved in the polymerase binding and splicing process, respectively. We analyzed the structure of the individual elements within promoters and splice sites using a novel technique that combines neural networks with weight pruning. For a complete promoter site prediction we combine these single predictions for each element using time-delay neural networks (TDNN). TDNNs are appropriate for recognizing promoter elements because they are able to combine multiple features, even those that ap...

