Results 1 - 10
of
20
Meaningful Change Detection in Structured Data
- IN PROCEEDINGS OF THE ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA
, 1997
"... Detecting changes by comparing data snapshots is an important requirement for difference queries, active databases, and version and configuration management. In this paper we focus on detecting meaningful changes in hierarchically structured data, such as nested-object data. This problem is much mor ..."
Abstract
-
Cited by 103 (8 self)
- Add to MetaCart
Detecting changes by comparing data snapshots is an important requirement for difference queries, active databases, and version and configuration management. In this paper we focus on detecting meaningful changes in hierarchically structured data, such as nested-object data. This problem is much more challenging than the corresponding one for relational or flat-file data. In order to describe changes better, we base our work not just on the traditional "atomic" insert, delete, update operations, but also on operations that move an entire sub-tree of nodes, and that copy an entire sub-tree. These operations allows us to describe changes in a semantically more meaningful way. Since this change detection problem is NP-hard, in this paper we present a heuristic change detection algorithm that yields close to "minimal" descriptions of the changes, and that has fewer restrictions than previous algorithms. Our algorithm is based on transforming the change detection problem to a problem of com...
Exact and Approximation Algorithms for Sorting By Reversals, With Application to Genome Rearrangement
, 1995
"... Motivated by the problem in computational biology of reconstructing the series of chromosome inversions by which one organism evolved from another, we consider the problem of computing the shortest series of reversals that transform one permutation to another. The permutations describe the order of ..."
Abstract
-
Cited by 70 (4 self)
- Add to MetaCart
Motivated by the problem in computational biology of reconstructing the series of chromosome inversions by which one organism evolved from another, we consider the problem of computing the shortest series of reversals that transform one permutation to another. The permutations describe the order of genes on corresponding chromosomes, and a reversal takes an arbitrary substring of elements and reverses their order. For this problem we develop two algorithms: a greedy approximation algorithm that finds a solution provably close to optimal in O(n 2 ) time and O(n) space for an n element permutation, and a branch and bound exact algorithm that finds an optimal solution in O(mL(n;n)) time and O(n 2 ) space, where m is the size of the branch and bound search tree and L(n; n) is the time to solve a linear program of n variables and n constraints. The greedy algorithm is the first to come within a constant factor of the optimum; it guarantees a solution that uses no more than twice the min...
Block Edit Models for Approximate String Matching
- Theoretical Computer Science
, 1997
"... In this paper we examine string block edit distance, in which two strings A and B are compared by extracting collections of substrings and placing them into correspondence. This model accounts for certain phenomena encountered in important real-world applications, including pen computing and molecu ..."
Abstract
-
Cited by 44 (4 self)
- Add to MetaCart
In this paper we examine string block edit distance, in which two strings A and B are compared by extracting collections of substrings and placing them into correspondence. This model accounts for certain phenomena encountered in important real-world applications, including pen computing and molecular biology. The basic problem admits a family of variations depending on whether the strings must be matched in their entireties, and whether overlap is permitted. We show that several variants are NPcomplete, and give polynomial-time algorithms for solving the remainder. Keywords: block edit distance, approximate string matching, sequence comparison, approximate ink matching, dynamic programming. 1 Introduction The edit distance model for string comparison [Lev66, NW70, WF74] has found widespread application in fields ranging from molecular biology to bird song classification [SK83]. A great deal of research has been devoted to this area, and numerous algorithms have been proposed for com...
Faster Algorithms for String Matching with k Mismatches
- J. OF ALGORITHMS
, 2000
"... The string matching with mismatches problem is that of finding the number of mismatches between a pattern P of length m and every length m substring of the text T . Currently, the fastest algorithms for this problem are the following. The Landau-Vishkin algorithm finds all locations where the pat ..."
Abstract
-
Cited by 39 (9 self)
- Add to MetaCart
The string matching with mismatches problem is that of finding the number of mismatches between a pattern P of length m and every length m substring of the text T . Currently, the fastest algorithms for this problem are the following. The Landau-Vishkin algorithm finds all locations where the pattern has at most k errors (where k is part of the input) in time O(nk). The Abrahamson algorithm finds the number of mismatches at every location in time O(n p m log m). We present
Tree-to-tree Correction for Document Trees
, 1995
"... Documents can be represented as ordered labelled trees. Finding the editing distance between documents is a particular case of the general problem for trees. We give a detailed survey of previous results, presenting them in a single notation to elucidate their commonalities. We then discuss two ways ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
Documents can be represented as ordered labelled trees. Finding the editing distance between documents is a particular case of the general problem for trees. We give a detailed survey of previous results, presenting them in a single notation to elucidate their commonalities. We then discuss two ways of extending these results---first, by changing the set of primitive editing operations used by existing algorithms and, second, by post-processing the output of the algorithms to recognize patterns of change significant to documents. Finally, we provide extensions of the first type. Our algorithm allows subtree operations but is otherwise similar to that of Zhang and Shasha. This is a corrected and expanded version of Technical Report 91-315. y This report was completed during a sabbatical at INRIA (Institute National de Recherche en Informatique et en Automatique) in Rocquencourt, France. Contents 1 Introduction 3 2 Background 5 2.1 String-to-String Correction: Wagner and Fischer ...
Pattern Matching with Swaps
, 1997
"... Let a text string T of n symbols and a pattern string P of m symbols from alphabet \Sigma be given. A swapped version T 0 of T is a length n string derived from T by a series of local swaps, (i.e. t 0 ` / t `+1 and t 0 `+1 / t ` ) where each element can participate in no more than one swap. ..."
Abstract
-
Cited by 17 (7 self)
- Add to MetaCart
Let a text string T of n symbols and a pattern string P of m symbols from alphabet \Sigma be given. A swapped version T 0 of T is a length n string derived from T by a series of local swaps, (i.e. t 0 ` / t `+1 and t 0 `+1 / t ` ) where each element can participate in no more than one swap. The Pattern Matching with Swaps problem is that of finding all locations i for which there exists a swapped version T 0 of T where there is an exact matching of P in location i of T 0 . It has been an open problem whether swapped matching can be done in less than O(mn) time. In this paper we show the first algorithm that solves the pattern matching with swaps problem in time o(mn). We present an algorithm whose time complexity is O(nm 1=3 log m log 2 min(m; j\Sigmaj)) for a general alphabet \Sigma. Key Words: Design and analysis of algorithms, combinatorial algorithms on words, pattern matching, pattern matching with swaps, non-standard pattern matching. Department of Mathematics...
Overlap Matching
- Information and Computation
, 2001
"... We propose a new paradigm for string matching, namely structural matching. In structural matching, the text and pattern contents are not important. Rather, some areas in the text and patterns are singled out, say intervals. A "match" is a text location where a specified relation between the text ..."
Abstract
-
Cited by 13 (4 self)
- Add to MetaCart
We propose a new paradigm for string matching, namely structural matching. In structural matching, the text and pattern contents are not important. Rather, some areas in the text and patterns are singled out, say intervals. A "match" is a text location where a specified relation between the text and pattern areas is satisfied. In particular we define the structural matching problem of Overlap (Parity) Matching. We seek the text locations where all overlaps of the given pattern and text intervals have even length. We show that this problem can be solved in time O(n log m), where the text length is n and the pattern length is m. As an application of overlap matching, we show how to reduce the String Matching with Swaps problem to the overlap matching problem. The String Matching with Swaps problem is the problem of string matching in the presence of local swaps. The best known deterministic upper bound for this problem was O(nm 1/3 log m log #) for a general alphabet #, wher...
Efficient Algorithms for Approximate String Matching with Swaps
- in LNCS 1264, Combinatorial Pattern Matching
, 1999
"... this paper we include the swap operation that interchanges two adjacent characters into the set of allowable edit operations, and we present an O(t min(m, n))-time algorithm for the extended edit distance problem, where t is the edit distance between the given strings, and an O(kn)-time algorithm ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
this paper we include the swap operation that interchanges two adjacent characters into the set of allowable edit operations, and we present an O(t min(m, n))-time algorithm for the extended edit distance problem, where t is the edit distance between the given strings, and an O(kn)-time algorithm for the extended k-differ- ences problem. That is, we add swaps into the set of edit operations without increasing the time complexities of previous algorithms that consider only changes, insertions, and deletions for the edit distance and k-differences problems. # 1999 Academic Press 1. INTRODUCTION Given two strings A[1}}}m] and B[1}}}n] over an alphabet 7, the edit distance between A and<F12
Efficient Algorithms for Sequence Analysis with Concave and Convex Gap Costs
, 1989
"... EFFICIENT ALGORITHMS FOR SEQUENCE ANALYSIS WITH CONCAVE AND CONVEX GAP COSTS David A. Eppstein We describe algorithms for two problems in sequence analysis: sequence alignment with gaps (multiple consecutive insertions and deletions treated as a unit) and RNA secondary structure with single loops ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
EFFICIENT ALGORITHMS FOR SEQUENCE ANALYSIS WITH CONCAVE AND CONVEX GAP COSTS David A. Eppstein We describe algorithms for two problems in sequence analysis: sequence alignment with gaps (multiple consecutive insertions and deletions treated as a unit) and RNA secondary structure with single loops only. We make the assumption that the gap cost or loop cost is a convex or concave function of the length of the gap or loop, and show how this assumption may be used to develop e#cient algorithms for these problems. We show how the restriction to convex or concave functions may be relaxed, and give algorithms for solving the problems when the cost functions are neither convex nor concave, but can be split into a small number of convex or concave functions. Finally we point out some sparsity in the structure of our sequence analysis problems, and describe how we may take advantage of that sparsity to further speed up our algorithms. CONTENTS 1. Introduction ............................1 ...
Sequence similarity: a nonaligning technique
- Sociological Methods and Research
, 2003
"... This article reviews objections to optimal-matching (OM) algorithms in sequence analysis and reformulates the concept ofsequence similarity in terms ofa binary precedence relation. This precedence relation is then used to develop a new quantification of sequence similarity. The new measure is used t ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
This article reviews objections to optimal-matching (OM) algorithms in sequence analysis and reformulates the concept ofsequence similarity in terms ofa binary precedence relation. This precedence relation is then used to develop a new quantification of sequence similarity. The new measure is used to reanalyze the life history data that were previously discussed by Dijkstra and Taris (1995). The reanalysis demonstrates the new measure to be superior to the OM algorithm and the alternatives proposed by Dijkstra and Taris. A new algorithm is presented to enumerate matching k-tuples from pairs of sequences in polynomial time.

