Results 1 - 10
of
29
Change Detection in Hierarchically Structured Information
- In Proceedings of the ACM SIGMOD International Conference on Management of Data
, 1996
"... Detecting and representing changes to data is important for active databases, data warehousing, view maintenance, and version and configuration management. Most previous work in change management has dealt with flat-file and relational data ..."
Abstract
-
Cited by 125 (11 self)
- Add to MetaCart
Detecting and representing changes to data is important for active databases, data warehousing, view maintenance, and version and configuration management. Most previous work in change management has dealt with flat-file and relational data
A vision for management of complex models
- SIGMOD Record
, 2000
"... Many problems encountered when building applications of database systems involve the manipulation of models. By “model, ” we mean a complex structure that represents a design artifact, such as a relational schema, object-oriented interface, UML model, XML DTD, web-site schema, semantic network, comp ..."
Abstract
-
Cited by 114 (20 self)
- Add to MetaCart
Many problems encountered when building applications of database systems involve the manipulation of models. By “model, ” we mean a complex structure that represents a design artifact, such as a relational schema, object-oriented interface, UML model, XML DTD, web-site schema, semantic network, complex document, or software configuration. Many uses of models involve managing changes in models and transformations of data from one model into another. These uses require an explicit representation of “mappings ” between models. We propose to make database systems easier to use for these applications by making “model ” and “model mapping ” first-class objects with special operations that simplify their use. We call this capability model management. In addition to making the case for model management, our main contribution is a sketch of a proposed data model. The data model consists of formal, object-oriented structures for representing models and model mappings, and of high-level algebraic operations on those structures, such as matching, differencing, merging, function application, selection, inversion and instantiation. We focus on structure and semantics, not implementation. 1
Meaningful Change Detection in Structured Data
- IN PROCEEDINGS OF THE ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA
, 1997
"... Detecting changes by comparing data snapshots is an important requirement for difference queries, active databases, and version and configuration management. In this paper we focus on detecting meaningful changes in hierarchically structured data, such as nested-object data. This problem is much mor ..."
Abstract
-
Cited by 103 (8 self)
- Add to MetaCart
Detecting changes by comparing data snapshots is an important requirement for difference queries, active databases, and version and configuration management. In this paper we focus on detecting meaningful changes in hierarchically structured data, such as nested-object data. This problem is much more challenging than the corresponding one for relational or flat-file data. In order to describe changes better, we base our work not just on the traditional "atomic" insert, delete, update operations, but also on operations that move an entire sub-tree of nodes, and that copy an entire sub-tree. These operations allows us to describe changes in a semantically more meaningful way. Since this change detection problem is NP-hard, in this paper we present a heuristic change detection algorithm that yields close to "minimal" descriptions of the changes, and that has fewer restrictions than previous algorithms. Our algorithm is based on transforming the change detection problem to a problem of com...
Detecting Changes in XML Documents
- In ICDE
, 2001
"... We present a diff algorithm for XML data. This work is motivated by the support for change control in the context of the Xyleme project that is investigating dynamic warehouses capable of storing massive volume of XML data. Because of the context, our algorithm has to be very efficient in terms of s ..."
Abstract
-
Cited by 102 (1 self)
- Add to MetaCart
We present a diff algorithm for XML data. This work is motivated by the support for change control in the context of the Xyleme project that is investigating dynamic warehouses capable of storing massive volume of XML data. Because of the context, our algorithm has to be very efficient in terms of speed and memory space even at the cost of some loss of "quality". Also, it considers, besides insertions, deletions and updates (standard in diffs), a move operation on subtrees that is essential in the context of XML. Intuitively, our diff algorithm uses signatures to match (large) subtrees that were left unchanged between the old and new versions. Such exact matchings are then possibly propagated to ancestors and descendants to obtain more matchings. It also uses XML specific information such as ID attributes. We provide a performance analysis of the algorithm. We show that it runs in average in linear time vs. quadratic time for previous algorithms. We present experiments on synthetic data that confirm the analysis. Since this problem is NPhard, the linear time is obtained by trading some quality. We present experiments (again on synthetic data) that show that the output of our algorithm is reasonably close to the "optimal" in terms of quality. Finally we present experiments on a small sample of XML pages found on the Web. 1
A General Edit Distance between RNA Structures
, 2001
"... Arc-annotated sequences are useful in representing the structural information of RNA sequences. ..."
Abstract
-
Cited by 51 (0 self)
- Add to MetaCart
Arc-annotated sequences are useful in representing the structural information of RNA sequences.
Approximate Tree Matching in the Presence of Variable Length Don't Cares
- Journal of Algorithms
, 1993
"... Ordered labeled trees are trees in which the sibling order matters. This paper presents algorithms for three problems having to do with approximate matching for such trees with variable-length don't cares (VLDC's). In strings, a VLDC symbol in the pattern may substitute for zero or more symbols i ..."
Abstract
-
Cited by 37 (7 self)
- Add to MetaCart
Ordered labeled trees are trees in which the sibling order matters. This paper presents algorithms for three problems having to do with approximate matching for such trees with variable-length don't cares (VLDC's). In strings, a VLDC symbol in the pattern may substitute for zero or more symbols in the data string. For example, if "comer" is the pattern, then the "" would substitute for the substring "put" when matching the data string "computer". Approximate VLDC matching in strings means that after the best possible substitution, the pattern still need not be the same as the data string for a match to be allowed. For example, "comer" matches "counter" within distance 1 (representing the cost of removing the "m" from "comer" and having the "" substitute for "unt"). We generalize approximate VLDC string matching to three algorithms for approximate VLDC matching on trees. The time complexity of our algorithms is O(jP j \Theta jDj \Theta min(depth(P ); leaves(P )) \Theta min(de...
An optimal decomposition algorithm for tree edit distance
- In Proceedings of the 34th International Colloquium on Automata, Languages and Programming (ICALP
, 2007
"... Abstract. The edit distance between two ordered rooted trees with vertex labels is the minimum cost of transforming one tree into the other by a sequence of elementary operations consisting of deleting and relabeling existing nodes, as well as inserting new nodes. In this paper, we present a worst-c ..."
Abstract
-
Cited by 28 (2 self)
- Add to MetaCart
Abstract. The edit distance between two ordered rooted trees with vertex labels is the minimum cost of transforming one tree into the other by a sequence of elementary operations consisting of deleting and relabeling existing nodes, as well as inserting new nodes. In this paper, we present a worst-case O(n 3)-time algorithm for this problem, improving the previous best O(n 3 log n)-time algorithm [9]. Our result requires a novel adaptive strategy for deciding how a dynamic program divides into subproblems, together with a deeper understanding of the previous algorithms for the problem. We prove the optimality of our algorithm among the family of decomposition strategy algorithms—which also includes the previous fastest algorithms—by tightening the known lower bound of Ω(n 2 log 2 n) [6] to Ω(n 3), matching our algorithm’s running time. Furthermore, we obtain matching upper and lower bounds of)) when the two trees have sizes m and n where m < n. Θ(nm 2 (1 + log n m
A methodology for evaluating theory revision systems: Results with Audrey II
- In Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence
, 1993
"... Theory revision systems are learning systems that have a goal of making small changes to an original theory to account for new data. A measure for the distance between two theories is proposed. This measure corresponds to the minimum number of edit operations at the literal level required to transfo ..."
Abstract
-
Cited by 23 (0 self)
- Add to MetaCart
Theory revision systems are learning systems that have a goal of making small changes to an original theory to account for new data. A measure for the distance between two theories is proposed. This measure corresponds to the minimum number of edit operations at the literal level required to transform one theory into another. By computing the distance between an original theory and a revised theory, the claim that a theory revision system makes few revisions to a theory may be quantitatively evaluated. We present data using both accuracy and the distance metric on Audrey II, a rst-order theory revision system. 1
Automating the Transformation of XML Documents
- In Proc. WIDM’01
, 2001
"... The advent of web services that use XML-based message exchanges has spurred many eorts that address issues related to inter-enterprise service electronic commerce interactions. Currently emerging standards and technologies enable enterprises to describe and advertise their own Web Services and to di ..."
Abstract
-
Cited by 20 (0 self)
- Add to MetaCart
The advent of web services that use XML-based message exchanges has spurred many eorts that address issues related to inter-enterprise service electronic commerce interactions. Currently emerging standards and technologies enable enterprises to describe and advertise their own Web Services and to discover and determine how to interact with services fronted by other businesses. However, these technologies do not address the problem of how to reconcile structural dierences between similar types of documents supported by dierent enterprises. Transformations between such documents must thus be created manually on a case-by-case basis. In this paper, we explore the problem of how to automate the transformation of XML E-business documents. We develop an integrated solution that automates as much as possible all steps of the document transformation process. One, we propose a set of schema transformation operations that establish semantic relationships between two XML document schemas. Two, we dene a model that allows us to compare the cost of performing these operations. Three, we introduce an algorithm that discovers an ecient sequence of operations for transforming a source document schema into a target document schema based on our cost model. The operation sequence then is used to generate an equivalent XSLT transformation script. Experimental results indicate that our algorithm can satisfactorily discover acceptable transformations. 1.
Tree-to-tree Correction for Document Trees
, 1995
"... Documents can be represented as ordered labelled trees. Finding the editing distance between documents is a particular case of the general problem for trees. We give a detailed survey of previous results, presenting them in a single notation to elucidate their commonalities. We then discuss two ways ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
Documents can be represented as ordered labelled trees. Finding the editing distance between documents is a particular case of the general problem for trees. We give a detailed survey of previous results, presenting them in a single notation to elucidate their commonalities. We then discuss two ways of extending these results---first, by changing the set of primitive editing operations used by existing algorithms and, second, by post-processing the output of the algorithms to recognize patterns of change significant to documents. Finally, we provide extensions of the first type. Our algorithm allows subtree operations but is otherwise similar to that of Zhang and Shasha. This is a corrected and expanded version of Technical Report 91-315. y This report was completed during a sabbatical at INRIA (Institute National de Recherche en Informatique et en Automatique) in Rocquencourt, France. Contents 1 Introduction 3 2 Background 5 2.1 String-to-String Correction: Wagner and Fischer ...

