Results 1  10
of
903,911
Significantly Lower Entropy Estimates for Natural DNA Sequences
 Journal of Computational Biology
, 1996
"... If DNA were a random string over its alphabet fA; C; G; Tg, an optimal code would assign 2 bits to each nucleotide. We imagine DNA to be a highly ordered, purposeful molecule, and might therefore reasonably expect statistical models of its string representation to produce much lower entropy estima ..."
Abstract

Cited by 47 (1 self)
 Add to MetaCart
estimates. Surprisingly this has not been the case for many natural DNA sequences, including portions of the human genome. We introduce a new statistical model (compression algorithm), the strongest reported to date, for naturally occurring DNA sequences. Conventional techniques code a nucleotide using only
A MaximumEntropyInspired Parser
, 1999
"... We present a new parser for parsing down to Penn treebank style parse trees that achieves 90.1% average precision/recall for sentences of length 40 and less, and 89.5% for sentences of length 100 and less when trained and tested on the previously established [5,9,10,15,17] "stan dard" se ..."
Abstract

Cited by 963 (19 self)
 Add to MetaCart
" sections of the Wall Street Journal tree bank. This represents a 13% decrease in error rate over the best singleparser results on this corpus [9]. The major technical innova tion is the use of a "maximumentropyinspired" model for conditioning and smoothing that let us successfully to test
Maximum entropy markov models for information extraction and segmentation
, 2000
"... Hidden Markov models (HMMs) are a powerful probabilistic tool for modeling sequential data, and have been applied with success to many textrelated tasks, such as partofspeech tagging, text segmentation and information extraction. In these cases, the observations are usually modeled as multinomial ..."
Abstract

Cited by 554 (18 self)
 Add to MetaCart
, capitalization, formatting, partofspeech), and defines the conditional probability of state sequences given observation sequences. It does this by using the maximum entropy framework to fit a set of exponential models that represent the probability of a state given an observation and the previous state. We
Tandem repeats finder: a program to analyze DNA sequences
, 1999
"... A tandem repeat in DNA is two or more contiguous, approximate copies of a pattern of nucleotides. Tandem repeats have been shown to cause human disease, may play a variety of regulatory and evolutionary roles and are important laboratory and analytic tools. Extensive knowledge about pattern size, co ..."
Abstract

Cited by 946 (9 self)
 Add to MetaCart
A tandem repeat in DNA is two or more contiguous, approximate copies of a pattern of nucleotides. Tandem repeats have been shown to cause human disease, may play a variety of regulatory and evolutionary roles and are important laboratory and analytic tools. Extensive knowledge about pattern size
Maximum Likelihood Phylogenetic Estimation from DNA Sequences with Variable Rates over Sites: Approximate Methods
 J. Mol. Evol
, 1994
"... Two approximate methods are proposed for maximum likelihood phylogenetic estimation, which allow variable rates of substitution across nucleotide sites. Three data sets with quite different characteristics were analyzed to examine empirically the performance of these methods. The first, called ..."
Abstract

Cited by 540 (28 self)
 Add to MetaCart
Two approximate methods are proposed for maximum likelihood phylogenetic estimation, which allow variable rates of substitution across nucleotide sites. Three data sets with quite different characteristics were analyzed to examine empirically the performance of these methods. The first, called
Sequence Logos: A New Way to Display Consensus Sequences
 Nucleic Acids Res
, 1990
"... INTRODUCTION A logo is "a single piece of type bearing two or more usually separate elements" [1]. In this paper, we use logos to display aligned sets of sequences. Sequence logos concentrate the following information into a single graphic [2]: 1. The general consensus of the sequences. ..."
Abstract

Cited by 638 (27 self)
 Add to MetaCart
. The relative frequencies of every residue at every position. 4. The amount of information present at every position in the sequence, measured in bits. 5. An initiation point, cut point, or other significant location (if appropriate) . Any aligned set of DNA, RNA or protein sequences can be represented using
Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456: 53–59
, 2008
"... ..."
Efficient similarity search in sequence databases
, 1994
"... We propose an indexing method for time sequences for processing similarity queries. We use the Discrete Fourier Transform (DFT) to map time sequences to the frequency domain, the crucial observation being that, for most sequences of practical interest, only the first few frequencies are strong. Anot ..."
Abstract

Cited by 505 (21 self)
 Add to MetaCart
. Another important observation is Parseval's theorem, which specifies that the Fourier transform preserves the Euclidean distance in the time or frequency domain. Having thus mapped sequences to a lowerdimensionality space by using only the first few Fourier coe cients, we use Rtrees to index
Missing value estimation methods for DNA microarrays
, 2001
"... Motivation: Gene expression microarray experiments can generate data sets with multiple missing expression values. Unfortunately, many algorithms for gene expression analysis require a complete matrix of gene array values as input. For example, methods such as hierarchical clustering and Kmeans clu ..."
Abstract

Cited by 476 (26 self)
 Add to MetaCart
. In this report, we investigate automated methods for estimating missing data.
Estimating the Support of a HighDimensional Distribution
, 1999
"... Suppose you are given some dataset drawn from an underlying probability distribution P and you want to estimate a "simple" subset S of input space such that the probability that a test point drawn from P lies outside of S is bounded by some a priori specified between 0 and 1. We propo ..."
Abstract

Cited by 766 (29 self)
 Add to MetaCart
Suppose you are given some dataset drawn from an underlying probability distribution P and you want to estimate a "simple" subset S of input space such that the probability that a test point drawn from P lies outside of S is bounded by some a priori specified between 0 and 1. We
Results 1  10
of
903,911