ON THE VARIANCE OF THE OPTIMAL ALIGNMENT SCORE FOR AN ASYMMETRIC SCORING FUNCTION
, 2007
"... We investigate the variance of the optimal alignment score of two independent iid binary, with parameter 1/2, sequences of length n. The scoring function is such that one letter has a somewhat larger score than the other letter. In this setting, we prove that the variance is of order n, and this co ..."
Cited by 2 (2 self)
We investigate the variance of the optimal alignment score of two independent iid binary, with parameter 1/2, sequences of length n. The scoring function is such that one letter has a somewhat larger score than the other letter. In this setting, we prove that the variance is of order n
A Program for Aligning Sentences in Bilingual Corpora
, 1993
"... This paper will describe a method and a program (align) for aligning sentences based on a simple statistical model of character lengths. The program uses the fact that longer sentences in one language tend to be translated into longer sentences in the other language, and that shorter sentences tend ..."
Cited by 529 (5 self)
but 4% of the sentences. Moreover, it is possible to extract a large subcorpus that has a much smaller error rate. By selecting the bestscoring 80% of the alignments, the error rate is reduced from 4% to 0.7%. There were more errors on the EnglishFrench subcorpus than on the EnglishGerman subcorpus
Muscle: multiple sequence alignment with high accuracy and high throughput
 NUCLEIC ACIDS RES
, 2004
"... We describe MUSCLE, a new computer program for creating multiple alignments of protein sequences. Elements of the algorithm include fast distance estimation using kmer counting, progressive alignment using a new profile function we call the logexpectation score, and refinement using treedependent r ..."
Cited by 2509 (7 self)
We describe MUSCLE, a new computer program for creating multiple alignments of protein sequences. Elements of the algorithm include fast distance estimation using kmer counting, progressive alignment using a new profile function we call the logexpectation score, and refinement using tree
The Alignment Template Approach to Statistical Machine Translation
, 2004
"... A phrasebased statistical machine translation approach — the alignment template approach — is described. This translation approach allows for general manytomany relations between words. Thereby, the context of words is taken into account in the translation model, and local changes in word order f ..."
Cited by 480 (26 self)
–English Canadian Hansards task, the alignment template system obtains significantly better results than a singlewordbased translation model. In the Chinese–English 2002 National Institute of Standards and Technology (NIST) machine translation evaluation it yields statistically significantly better NIST scores
Statistical analysis of sequencestructure alignment scores
"... Abstract The structural analysis of proteins is fundamental to the analysis of protein functions. In this context, sequencestructure alignment methods are important among the different empirical methods. In order to assess the quality of sequencestructure alignments, a statistical method using a ..."
Abstract The structural analysis of proteins is fundamental to the analysis of protein functions. In this context, sequencestructure alignment methods are important among the different empirical methods. In order to assess the quality of sequencestructure alignments, a statistical method using a
SUMMATION TEST FOR GAP PENALTIES AND STRONG LAW OF THE LOCAL ALIGNMENT SCORE 1
, 2005
"... A summation test is proposed to determine admissible types of gap penalties for logarithmic growth of the local alignment score. We also define a converging sequence of log moment generating functions that provide the constants associated with the large deviation rate and logarithmic strong law of t ..."
A summation test is proposed to determine admissible types of gap penalties for logarithmic growth of the local alignment score. We also define a converging sequence of log moment generating functions that provide the constants associated with the large deviation rate and logarithmic strong law
Effects of LongRange Correlations in DNA on Sequence Alignment Score Statistics
"... Longrange correlations in genomic base composition are a ubiquitous statistical feature among many eukaryotic genomes. In this article, these correlations are shown to substantially influence the statistics of sequence alignment scores. Using a Gaussian approximation to model the correlated score ..."
Longrange correlations in genomic base composition are a ubiquitous statistical feature among many eukaryotic genomes. In this article, these correlations are shown to substantially influence the statistics of sequence alignment scores. Using a Gaussian approximation to model the correlated score
Gapped BLAST and PSIBLAST: a new generation of protein database search programs.
 Nucleic Acids Res.
, 1997
"... ABSTRACT The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantia ..."
Cited by 8572 (88 self)
is introduced for automatically combining statistically significant alignments produced by BLAST into a positionspecific score matrix, and searching the database using this matrix. The resulting PositionSpecific Iterated BLAST (PSIBLAST) program runs at approximately the same speed per iteration as gapped
What is a hidden Markov model?
, 2004
"... Often, problems in biological sequence analysis are just a matter of putting the right label on each residue. In gene identification, we want to label nucleotides as exons, introns, or intergenic sequence. In sequence alignment, we want to associate residues in a query sequence with homologous resi ..."
Cited by 1344 (8 self)
Often, problems in biological sequence analysis are just a matter of putting the right label on each residue. In gene identification, we want to label nucleotides as exons, introns, or intergenic sequence. In sequence alignment, we want to associate residues in a query sequence with ho
An Algorithm For Locating NonOverlapping Regions Of Maximum Alignment Score
 SIAM J. Comput
, 1993
"... . In this paper we present an O(N 2 log 2 N) algorithm for finding the two nonoverlapping substrings of a given string of length N which have the highestscoring alignment between them. This significantly improves the previously best known bound of O(N 3 ) for the worstcase complexity of thi ..."
Cited by 29 (2 self)
. In this paper we present an O(N 2 log 2 N) algorithm for finding the two nonoverlapping substrings of a given string of length N which have the highestscoring alignment between them. This significantly improves the previously best known bound of O(N 3 ) for the worstcase complexity
