Results 1 - 10
of
92
Continual Queries for Internet Scale Event-Driven Information Delivery
- IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
, 1999
"... In this paper we introduce the concept of continual queries, describe the design of a distributed event-driven continual query system -- OpenCQ, and outline the initial implementation of OpenCQ on top of the distributed interoperable information mediation system DIOM [21, 19]. Continual queries a ..."
Abstract
-
Cited by 153 (13 self)
- Add to MetaCart
In this paper we introduce the concept of continual queries, describe the design of a distributed event-driven continual query system -- OpenCQ, and outline the initial implementation of OpenCQ on top of the distributed interoperable information mediation system DIOM [21, 19]. Continual queries are standing queries that monitor update of interest and return results whenever the update reaches specified thresholds. In OpenCQ, users may specify to the system the information they would like to monitor (such as the events or the update thresholds they are interested in). Whenever the information of interest becomes available, the system immediately delivers it to the relevant users; otherwise, the system continually monitors the arrival of the desired information and pushes it to the relevant users as it meets the specified update thresholds. In contrast to conventional pull-based data management systems such as DBMSs and Web search engines, OpenCQ exhibits two important featu...
A vision for management of complex models
- SIGMOD Record
, 2000
"... Many problems encountered when building applications of database systems involve the manipulation of models. By “model, ” we mean a complex structure that represents a design artifact, such as a relational schema, object-oriented interface, UML model, XML DTD, web-site schema, semantic network, comp ..."
Abstract
-
Cited by 114 (20 self)
- Add to MetaCart
Many problems encountered when building applications of database systems involve the manipulation of models. By “model, ” we mean a complex structure that represents a design artifact, such as a relational schema, object-oriented interface, UML model, XML DTD, web-site schema, semantic network, complex document, or software configuration. Many uses of models involve managing changes in models and transformations of data from one model into another. These uses require an explicit representation of “mappings ” between models. We propose to make database systems easier to use for these applications by making “model ” and “model mapping ” first-class objects with special operations that simplify their use. We call this capability model management. In addition to making the case for model management, our main contribution is a sketch of a proposed data model. The data model consists of formal, object-oriented structures for representing models and model mappings, and of high-level algebraic operations on those structures, such as matching, differencing, merging, function application, selection, inversion and instantiation. We focus on structure and semantics, not implementation. 1
Meaningful Change Detection in Structured Data
- IN PROCEEDINGS OF THE ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA
, 1997
"... Detecting changes by comparing data snapshots is an important requirement for difference queries, active databases, and version and configuration management. In this paper we focus on detecting meaningful changes in hierarchically structured data, such as nested-object data. This problem is much mor ..."
Abstract
-
Cited by 103 (8 self)
- Add to MetaCart
Detecting changes by comparing data snapshots is an important requirement for difference queries, active databases, and version and configuration management. In this paper we focus on detecting meaningful changes in hierarchically structured data, such as nested-object data. This problem is much more challenging than the corresponding one for relational or flat-file data. In order to describe changes better, we base our work not just on the traditional "atomic" insert, delete, update operations, but also on operations that move an entire sub-tree of nodes, and that copy an entire sub-tree. These operations allows us to describe changes in a semantically more meaningful way. Since this change detection problem is NP-hard, in this paper we present a heuristic change detection algorithm that yields close to "minimal" descriptions of the changes, and that has fewer restrictions than previous algorithms. Our algorithm is based on transforming the change detection problem to a problem of com...
Detecting Changes in XML Documents
- In ICDE
, 2001
"... We present a diff algorithm for XML data. This work is motivated by the support for change control in the context of the Xyleme project that is investigating dynamic warehouses capable of storing massive volume of XML data. Because of the context, our algorithm has to be very efficient in terms of s ..."
Abstract
-
Cited by 102 (1 self)
- Add to MetaCart
We present a diff algorithm for XML data. This work is motivated by the support for change control in the context of the Xyleme project that is investigating dynamic warehouses capable of storing massive volume of XML data. Because of the context, our algorithm has to be very efficient in terms of speed and memory space even at the cost of some loss of "quality". Also, it considers, besides insertions, deletions and updates (standard in diffs), a move operation on subtrees that is essential in the context of XML. Intuitively, our diff algorithm uses signatures to match (large) subtrees that were left unchanged between the old and new versions. Such exact matchings are then possibly propagated to ancestors and descendants to obtain more matchings. It also uses XML specific information such as ID attributes. We provide a performance analysis of the algorithm. We show that it runs in average in linear time vs. quadratic time for previous algorithms. We present experiments on synthetic data that confirm the analysis. Since this problem is NPhard, the linear time is obtained by trading some quality. We present experiments (again on synthetic data) that show that the output of our algorithm is reasonably close to the "optimal" in terms of quality. Finally we present experiments on a small sample of XML pages found on the Web. 1
Archiving scientific data
- In ACM SIGMOD
, 2002
"... Archiving is important for scientific data, where it is necessary to record all past versions of a database in order to verify findings based upon a specific version. Much scientific data is held in a hierachical format and has a key structure that provides a canonical identification for each elemen ..."
Abstract
-
Cited by 97 (8 self)
- Add to MetaCart
Archiving is important for scientific data, where it is necessary to record all past versions of a database in order to verify findings based upon a specific version. Much scientific data is held in a hierachical format and has a key structure that provides a canonical identification for each element of the hierarchy. In this article, we exploit these properties to develop an archiving technique that is both efficient in its use of space and preserves the continuity of elements through versions of the database, something that is not provided by traditional minimum-edit-distance diff approaches. The approach also uses timestamps. All versions of the data are merged into one hierarchy where an element appearing in multiple versions is stored only once along with a timestamp. By identifying the semantic continuity of elements and merging them into one data structure, our technique is capable of providing meaningful change descriptions, the archive allows us to easily answer certain temporal queries such as retrieval of any specific version from the archive and finding the history of an element. This is in contrast with approaches that store a sequence of deltas where such operations may require undoing a large number of changes or significant reasoning with the
Algorithmics and Applications of Tree and Graph Searching
- In Symposium on Principles of Database Systems
, 2002
"... Modern search engines answer keyword-based queries extremely efficiently. The impressive speed is due to clever inverted index structures, caching, a domain-independent knowledge of strings, and thousands of machines. Several research efforts have attempted to generalize keyword search to keytree an ..."
Abstract
-
Cited by 89 (8 self)
- Add to MetaCart
Modern search engines answer keyword-based queries extremely efficiently. The impressive speed is due to clever inverted index structures, caching, a domain-independent knowledge of strings, and thousands of machines. Several research efforts have attempted to generalize keyword search to keytree and keygraph searching, because trees and graphs have many applications in next-generation database systems. This paper surveys both algorithms and applications, giving some emphasis to our own work.
Representing and querying changes in semistructured data
- In Proceedings of the International Conference on Data Engineering
, 1998
"... Semistructured data may be irregular and incomplete and does not necessarily conform to a fixed schema. As with structured data, it is often desirable to maintain a history of changes to data, and to query over both the data and the changes. Representing and querying changes in semistructured data i ..."
Abstract
-
Cited by 77 (10 self)
- Add to MetaCart
Semistructured data may be irregular and incomplete and does not necessarily conform to a fixed schema. As with structured data, it is often desirable to maintain a history of changes to data, and to query over both the data and the changes. Representing and querying changes in semistructured data is more difficult than in structured data due to the irregularity and lack of schema. We present a model for representing changes in semistructured data and a language for querying over these changes. An important feature of our approach is that we represent and query changes directly as annotationson the affected data, instead of indirectly as the difference between database states. We describe the implementation of our model and query language. We also describe the design and implementation of a query subscription service that permits users to subscribe to changes in semistructured information sources. 1
Evaluating Structural Similarity in XML Documents
, 2002
"... XML documents on the web are often found without DTDs, particularly when these documents have been created from legacy HTML. Yet having knowledge of the DTD can be valuable in querying and manipulating such documents. Recent work (cf. [10]) has given us a means to (re-)construct a DTD to describe th ..."
Abstract
-
Cited by 62 (0 self)
- Add to MetaCart
XML documents on the web are often found without DTDs, particularly when these documents have been created from legacy HTML. Yet having knowledge of the DTD can be valuable in querying and manipulating such documents. Recent work (cf. [10]) has given us a means to (re-)construct a DTD to describe the structure common to a given set of document instances. However, given a collection of documents with unknown DTDs, it may not be appropriate to construct a single DTD to describe every document in the collection. Instead, we would wish to partition the collection into smaller sets of "similar" documents, and then induce a separate DTD for each such set. It is this partitioning problem that we address in this paper. Given two
Efficient Structural Joins on Indexed XML Documents
- In VLDB
, 2002
"... Queries on XML documents typically combine selections on element contents, and, via path expressions, the structural relationships between tagged elements. Structural joins are used to find all pairs of elements satisfying the primitive structural relationships specified in the query, namely, ..."
Abstract
-
Cited by 60 (3 self)
- Add to MetaCart
Queries on XML documents typically combine selections on element contents, and, via path expressions, the structural relationships between tagged elements. Structural joins are used to find all pairs of elements satisfying the primitive structural relationships specified in the query, namely, parent child and ancestor descendant relationships. Efficient support for structural joins is thus the key to efficient implementations of XML queries. Recently proposed node numbering schemes enable the capturing of the XML document structure us- ing traditional indices (such as B+-trees or R-trees). This paper proposes efficient struc- tural join algorithms in the presence of tag indices. We first concentrate on using B+- trees and show how to expedite a structural join by avoiding collections of elements that do not participate in the join. We then intro- duce an enhancement (based on sibling pointers) that further improves performance. Such sibling pointers are easily implemented and dynamically maintainable. We also present a structural join algorithm that utilizes R-trees.
Efficient Management of Multiversion Documents by Object Referencing
, 2001
"... Traditional approaches to versioning semistructured information are edit-based, i.e., subsequent document versions are represented by using edit scripts. This paper proposes a reference-based version management scheme that preserves the logical structure of the evolving document through the use of o ..."
Abstract
-
Cited by 51 (12 self)
- Add to MetaCart
Traditional approaches to versioning semistructured information are edit-based, i.e., subsequent document versions are represented by using edit scripts. This paper proposes a reference-based version management scheme that preserves the logical structure of the evolving document through the use of object references. By preserving the document structure among versions the new scheme facilitates more efficient query support. In particular, we examine queries involving projections and selections on the document versions, as well as queries on the document evolution history. Moreover, we show that the proposed scheme provides an effective representation of multiversioned XML documents, both at the transport and exchange levels. In fact, with the reference-based scheme, a document's history can also be viewed and processed as yet another XML document. Furthermore, we demonstrate the effectiveness of the new scheme at the storage level. In particular, the scheme is enhanced with a usefulness-based page management policy that extends and adapts techniques used in transaction-time databases to ensure efficient clustering of information among versions. An extensive comparison of the reference-based versioning against representations used in temporal databases and persistent object managers depicts the performance advantages of the new approach. Finally it should be noted that reference-based versioning is applicable to other kinds of semistructured information (besides XML documents), and can be used to replace traditional version control schemes, such as the edit-based RCS and the timestamp-based SCCS.

