Results 1 - 10
of
21
Prediction of complete gene structures in human genomic DNA
- J. Mol. Biol
, 1997
"... The problem of identifying genes in genomic DNA sequences by computational methods has attracted considerable research attention in recent years. From one point of view, the problem is closely ..."
Abstract
-
Cited by 487 (7 self)
- Add to MetaCart
The problem of identifying genes in genomic DNA sequences by computational methods has attracted considerable research attention in recent years. From one point of view, the problem is closely
Identification of Genes in Human Genomic DNA
, 1997
"... A general probabilistic model of the gene structural and compositional properties of human genomic DNA is introduced and applied to the problem of identifying genes in unannotated human genomic sequences. The model uses a \Hidden semi-Markov" or semi-Markov source architecture which incorporate ..."
Abstract
-
Cited by 23 (1 self)
- Add to MetaCart
A general probabilistic model of the gene structural and compositional properties of human genomic DNA is introduced and applied to the problem of identifying genes in unannotated human genomic sequences. The model uses a \Hidden semi-Markov" or semi-Markov source architecture which incorporates probabilistic descriptions of fundamental transcriptional, translational and splicing signals, as well as length distri-butions and compositional features of exons, introns and intergenic regions. Distinct sets of model parameters are derived which account for many of the substantial di er-ences in gene density and structure observed in distinct C+G compositional regions (\isochores") of the human genome. A novel model building procedure, termed Max-imal Dependence Decomposition, is introduced which captures potentially important dependencies between non-adjacent aswell as adjacent positions in a biological signal. Application of this model to the donor splice signal not only gives better discrimina-tion of potential donor sites than previous probabilistic models, but also reveals subtle properties of this signal which suggest aspects of its biochemical function. Acceptor
Evaluation of gene-finding programs on mammalian sequences
- Genome Res
, 2001
"... Article cited in: ..."
Nature and structure of human genes that generate retropseudogenes
- Genome Res
, 2000
"... service ..."
An Optimal Algorithm for the Maximum-Density Segment Problem
- SIAM Journal on Computing
, 2004
"... Abstract. We address a fundamental problem arising from analysis of biomolecular sequences. The input consists of two numbers wmin and wmax and a sequence S of n number pairs (ai,wi) with wi> 0. Let segment S(i, j) of S be the consecutive subsequence of S between indices i and j. The density of S(i, ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
Abstract. We address a fundamental problem arising from analysis of biomolecular sequences. The input consists of two numbers wmin and wmax and a sequence S of n number pairs (ai,wi) with wi> 0. Let segment S(i, j) of S be the consecutive subsequence of S between indices i and j. The density of S(i, j) is d(i, j) =(ai + ai+1 + ···+ aj)/(wi + wi+1 + ···+ wj). The maximum-density segment problem is to find a maximum-density segment over all segments S(i, j) with wmin ≤ wi + wi+1 + ···+ wj ≤ wmax. The best previously known algorithm for the problem, due to Goldwasser, Kao, and Lu [Proceedings of the Second International Workshop on Algorithms
Statistical Properties of Open Reading Frames in Complete Genome Sequences
, 1999
"... Some statistical properties of open reading frames in all currently available complete genome sequences are analyzed (seventeen prokatyotic genomes, and 16 chromosome sequences from the yeast genome). The size distribution of open reading frames is characterized by various techniques, such as quanti ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
Some statistical properties of open reading frames in all currently available complete genome sequences are analyzed (seventeen prokatyotic genomes, and 16 chromosome sequences from the yeast genome). The size distribution of open reading frames is characterized by various techniques, such as quantile tables, QQ-plots, rank-size plots (Zipf's plots), and spatial densities. The issue of the influence of CG% on the size distribution is addressed. When yeast chromosomes are compared with archaeal and eubacterial genomes, they tend to have more long open reading frames. There is little or no evidence to reject the null hypothesis that open reading frames on six different reading frames and two strands distribute similarly. A topic of current interest, the base composition asymmetry in open reading frames between the two strands, is studied using regression analysis. The base composition asymmetry at three codon positions is analyzed separately. It was shown in these genome sequences that the first codon position is G- and A-rich (i.e. purine-rich); there is a co-existence of A- and T-rich branches at the second codon position; and the third codon position is weakly T-rich.
Linear-Time Algorithms for Computing Maximum-Density Sequence Segments with Bioinformatics Applications
"... We study an abstract optimization problem arising from biomolecular sequence analysis. For a sequence A of pairs (a i ; w i ) for i = 1; : : : ; n and w i > 0, a segment A(i; j) is a consecutive subsequence of A starting with index i and ending with index j. The width of A(i; j) is w(i; j) = w k ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
We study an abstract optimization problem arising from biomolecular sequence analysis. For a sequence A of pairs (a i ; w i ) for i = 1; : : : ; n and w i > 0, a segment A(i; j) is a consecutive subsequence of A starting with index i and ending with index j. The width of A(i; j) is w(i; j) = w k , and the density is ( ikj a k )=w(i; j): The maximum-density segment problem takes A and two values L and U as input and asks for a segment of A with the largest possible density among those of width at least L and at most U . When U is unbounded, we provide a relatively simple, O(n)-time algorithm, improving upon the O(n log L)-time algorithm by Lin, Jiang and Chao. When both L and U are speci ed, there are no previous nontrivial results. We solve the problem in O(n) time if w i = 1 for all i, and more generally in O(n + n log(U L + 1)) time when w i 1 for all i.
Fast Algorithms for Finding Maximum-Density Segments of a Sequence with Applications to Bioinformatics
- in Proceedings of the Second International Workshop on Algorithms in Bioinformatics
, 2002
"... We study an abstract optimization problem arising from biomolecular sequence analysis. For a sequence A = ha1 ; a2 ; : : : ; ani of real numbers, a segment S is a consecutive subsequence ha i ; a i+1 ; : : : ; a j i. ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
We study an abstract optimization problem arising from biomolecular sequence analysis. For a sequence A = ha1 ; a2 ; : : : ; ani of real numbers, a segment S is a consecutive subsequence ha i ; a i+1 ; : : : ; a j i.
Computational Genefinding
, 1998
"... Introduction Computational methodology for finding genes and other functional sites in genomic DNA has evolved significantly over the last 20 years. Excellent recent surveys have been given by Guig'o [10], Claverie [3], Krogh [14] and others. Among the types of functional sites in genomic DNA that ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Introduction Computational methodology for finding genes and other functional sites in genomic DNA has evolved significantly over the last 20 years. Excellent recent surveys have been given by Guig'o [10], Claverie [3], Krogh [14] and others. Among the types of functional sites in genomic DNA that researchers have sought to recognize are splice sites, start and stop codons, branch points, promoters and terminators of transcription, polyadenylation sites, ribosomal binding sites, topoisomerase II binding sites, topoisomerase I cleavage sites, and various transcription factor binding sites [8]. Local sites such as these are called signals and methods for detecting them may be called signal sensors. Genomic DNA signals can be contrasted with extended and variable length regions such as exons and introns, which are recognized by different methods that may be called content sensors [26]. 2 Signal Sensors The most bas
MPEG Sofwarc Simulation Group ftp://ftp .netcom. com/pub/cf/cf ogg/mpeg2
"... While the genomes of many organisms have been sequenced over the last few years, transforming such raw sequence data into knowledge remains a hard task. A great number of prediction programs have been developed that try to address one part of this problem, which consists of locating the genes along ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
While the genomes of many organisms have been sequenced over the last few years, transforming such raw sequence data into knowledge remains a hard task. A great number of prediction programs have been developed that try to address one part of this problem, which consists of locating the genes along a genome. This paper reviews the existing approaches to predicting genes in eukaryotic genomes and underlines their intrinsic advantages and limitations. The main mathematical models and computational algorithms adopted are also brie¯y described and the resulting software classi®ed according to both the method and the type of evidence used. Finally, the several dif®culties and pitfalls encountered by the programs are detailed, showing that improvements are needed and that new directions must be considered.

