Results 1  10
of
28
A greedy algorithm for aligning DNA sequences
 J. COMPUT. BIOL
, 2000
"... For aligning DNA sequences that differ only by sequencing errors, or by equivalent errors from other sources, a greedy algorithm can be much faster than traditional dynamic programming approaches and yet produce an alignment that is guaranteed to be theoretically optimal. We introduce a new greedy a ..."
Abstract

Cited by 240 (15 self)
 Add to MetaCart
For aligning DNA sequences that differ only by sequencing errors, or by equivalent errors from other sources, a greedy algorithm can be much faster than traditional dynamic programming approaches and yet produce an alignment that is guaranteed to be theoretically optimal. We introduce a new greedy alignment algorithm with particularly good performance and show that it computes the same alignment as does a certain dynamic programming algorithm, while executing over 10 times faster on appropriate data. An implementation of this algorithm is currently used in a program that assembles the UniGene database at the National Center for Biotechnology Information.
Meaningful Change Detection in Structured Data
 IN PROCEEDINGS OF THE ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA
, 1997
"... Detecting changes by comparing data snapshots is an important requirement for difference queries, active databases, and version and configuration management. In this paper we focus on detecting meaningful changes in hierarchically structured data, such as nestedobject data. This problem is much mor ..."
Abstract

Cited by 115 (8 self)
 Add to MetaCart
Detecting changes by comparing data snapshots is an important requirement for difference queries, active databases, and version and configuration management. In this paper we focus on detecting meaningful changes in hierarchically structured data, such as nestedobject data. This problem is much more challenging than the corresponding one for relational or flatfile data. In order to describe changes better, we base our work not just on the traditional "atomic" insert, delete, update operations, but also on operations that move an entire subtree of nodes, and that copy an entire subtree. These operations allows us to describe changes in a semantically more meaningful way. Since this change detection problem is NPhard, in this paper we present a heuristic change detection algorithm that yields close to "minimal" descriptions of the changes, and that has fewer restrictions than previous algorithms. Our algorithm is based on transforming the change detection problem to a problem of com...
Comparing hierarchical data in external memory
 In 25th Very Large Data Base Conference (VLDB
, 1999
"... We present an externalmemory algorithm for computing a minimumcost edit script between two rooted, ordered, labeled trees. The I/O, RAM, and CPU costs of our algorithm are, respectively, 4mn+7m+5n, 6S, andO(MN+(M+N)S1:5), whereMandNare the input tree sizes,Sis the block size,m=M=S, andn=N=S. This ..."
Abstract

Cited by 51 (2 self)
 Add to MetaCart
We present an externalmemory algorithm for computing a minimumcost edit script between two rooted, ordered, labeled trees. The I/O, RAM, and CPU costs of our algorithm are, respectively, 4mn+7m+5n, 6S, andO(MN+(M+N)S1:5), whereMandNare the input tree sizes,Sis the block size,m=M=S, andn=N=S. This algorithm can make effective use of surplus RAM capacity to quadratically reduce I/O cost. We extend to trees the commonly used mapping from sequence comparison problems to shortestpath problems in edit graphs. 1
diffX: An Algorithm to detect Changes in Multi Verios XML Documents
 IN PROCEEDINGS OF THE CASCON’05
, 2005
"... This paper presents the diffX algorithm for detecting changes between two versions of an XML document. The identified changes are reported as a script of edit operations. The script, when applied to the first version of the XML document, will produce the second version. The goal is to optimize the r ..."
Abstract

Cited by 22 (0 self)
 Add to MetaCart
This paper presents the diffX algorithm for detecting changes between two versions of an XML document. The identified changes are reported as a script of edit operations. The script, when applied to the first version of the XML document, will produce the second version. The goal is to optimize the runtime of mapping the nodes between the two versions and to minimize the size of the edit script. To achieve this goal an isolated tree fragment mapping technique is used, in order to iteratively identify the largest matching tree fragments between the tree representations, of the two versions of the document. The mapping technique is robust enough to handle differences in both the structure and the content of the two trees. The generated edit script from the mapping acknowledges the different order sensitiveness of element and attributes of XML data model. The primitives for the edit script comprise both the atomic (node) and nonatomic (subtree) edit operations natural to XML document modification. The runtime of the algorithm is O(n²).
Expected Length of Longest Common Subsequences
"... Contents 1 Introduction 1 2 Notation and preliminaries 4 2.1 Notation and basic definitions : : : : : : : : : : : : : : : : : : 4 2.2 Longest common subsequences : : : : : : : : : : : : : : : : : : 7 2.3 Computing longest common subsequences : : : : : : : : : : : 10 2.4 Expected length of longest c ..."
Abstract

Cited by 19 (2 self)
 Add to MetaCart
Contents 1 Introduction 1 2 Notation and preliminaries 4 2.1 Notation and basic definitions : : : : : : : : : : : : : : : : : : 4 2.2 Longest common subsequences : : : : : : : : : : : : : : : : : : 7 2.3 Computing longest common subsequences : : : : : : : : : : : 10 2.4 Expected length of longest common subsequences : : : : : : : 14 3 Lower Bounds 20 3.1 Css machines : : : : : : : : : : : : : : : : : : : : : : : : : : : 20 3.2 Analysis of css machines : : : : : : : : : : : : : : : : : : : : : 26 3.3 Design of css machines : : : : : : : : : : : : : : : : : : : : : : 31 3.4 Labeled css machines : : : : : : : : : : : : : : : : : : : : : : : 38 4 Upper bounds 45 4.1 Collations : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 45 4.2 Previous upper bounds : : : : : : : : : : : : : : : : : : : : : : 51 4.3 Simple upper bound (binary alphabet) : : : : : : : : : : : : : 55 4.4 Simple upper bound (alphabet size 3) : : : : : : : : : : : : : : 59 4.5 Upper bounds for binary alphabet : :
A class of edit kernels for SVMs to predict translation initiation sites in eukaryotic mRNAs
 Journal of Computational Biology
, 2004
"... The prediction of translation initiation sites (TISs) in eukaryotic mRNAs has been a challenging problem in computational molecular biology. In this paper, we present a new algorithm to recognize TISs with a very high accuracy. Our algorithm includes two novel ideas. First, we introduce a class of n ..."
Abstract

Cited by 15 (0 self)
 Add to MetaCart
The prediction of translation initiation sites (TISs) in eukaryotic mRNAs has been a challenging problem in computational molecular biology. In this paper, we present a new algorithm to recognize TISs with a very high accuracy. Our algorithm includes two novel ideas. First, we introduce a class of new sequencesimilarity kernels based on string edit, called the edit kernels, for use with support vector machines (SVMs) in a discriminative approach to predict TISs. The edit kernels are simple and have significant biological and probabilistic interpretations. Although the edit kernels are not positive definite, it is easy to make the kernel matrix positive definite by adjusting the parameters. Second, we convert the region of an input mRNA sequence downstream to a putative TIS into an amino acid sequence before applying SVMs to avoid the high redundancy in the genetic code. The algorithm has been implemented and tested on previously published data. Our experimental results on real mRNA data show that both ideas improve the prediction accuracy greatly and our method performs significantly better than those based on neural networks and SVMs with polynomial kernels or Salzberg kernel.
BitParallel LCSlength Computation Revisited
 In Proc. 15th Australasian Workshop on Combinatorial Algorithms (AWOCA
, 2004
"... The longest common subsequence (LCS) is a classic and wellstudied measure of similarity between two strings A and B. This problem has two variants: determining the length of the LCS (LLCS), and recovering an LCS itself. In this paper we address the first of these two. Let m and n denote the leng ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
The longest common subsequence (LCS) is a classic and wellstudied measure of similarity between two strings A and B. This problem has two variants: determining the length of the LCS (LLCS), and recovering an LCS itself. In this paper we address the first of these two. Let m and n denote the lengths of the strings A and B, respectively, and w denote the computer word size. First we give a slightly improved formula for the bitparallel O(#m/w#n) LLCS algorithm of Crochemore et al. [4]. Then we discuss the relative performance of the bitparallel algorithms and compare our variant against one of the best conventional LLCS algorithms. Finally we propose and evaluate an O(#d/w#n) version of the algorithm, where d is the simple (indel) edit distance between A and B.
A comparative study for XML change detection
, 2002
"... Change detection is an important part of version management for databases and document archives. The success of XML has recently renewed interest in change detection ontrees and semistructured data, and various algorithms have been proposed. We study here different algorithms and representations of ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
Change detection is an important part of version management for databases and document archives. The success of XML has recently renewed interest in change detection ontrees and semistructured data, and various algorithms have been proposed. We study here different algorithms and representations of changes based on their formal definition and on experiments conducted over XML data from the Web. Our goal is to provide an evaluation of the quality of the results, the performance of the tools and, based on this, guide the users in choosing the appropriate solution for their applications.
Maximal Common Subsequences and Minimal Common Supersequences
, 1995
"... The problems of finding a longest common subsequence and a shortest common supersequence of a set of strings are wellknown. They can be solved in polynomial time for two strings (in fact the problems are dual in this case), or for any fixed number of strings, by dynamic programming. ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
The problems of finding a longest common subsequence and a shortest common supersequence of a set of strings are wellknown. They can be solved in polynomial time for two strings (in fact the problems are dual in this case), or for any fixed number of strings, by dynamic programming.
Statistical Alignment: Recent Progress, New Applications, and Challenges
"... Two papers by Thorne, Kishino, and Felsenstein in the early 90’s provided a basis for performing alignment within a statistical framework. Here we review progress and associated challenges in the investigation of models of insertions and deletions in biological sequences stemming from this early ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
Two papers by Thorne, Kishino, and Felsenstein in the early 90’s provided a basis for performing alignment within a statistical framework. Here we review progress and associated challenges in the investigation of models of insertions and deletions in biological sequences stemming from this early