Results 1 - 10
of
66
Archiving scientific data
- In ACM SIGMOD
, 2002
"... Archiving is important for scientific data, where it is necessary to record all past versions of a database in order to verify findings based upon a specific version. Much scientific data is held in a hierachical format and has a key structure that provides a canonical identification for each elemen ..."
Abstract
-
Cited by 97 (8 self)
- Add to MetaCart
Archiving is important for scientific data, where it is necessary to record all past versions of a database in order to verify findings based upon a specific version. Much scientific data is held in a hierachical format and has a key structure that provides a canonical identification for each element of the hierarchy. In this article, we exploit these properties to develop an archiving technique that is both efficient in its use of space and preserves the continuity of elements through versions of the database, something that is not provided by traditional minimum-edit-distance diff approaches. The approach also uses timestamps. All versions of the data are merged into one hierarchy where an element appearing in multiple versions is stored only once along with a timestamp. By identifying the semantic continuity of elements and merging them into one data structure, our technique is capable of providing meaningful change descriptions, the archive allows us to easily answer certain temporal queries such as retrieval of any specific version from the archive and finding the history of an element. This is in contrast with approaches that store a sequence of deltas where such operations may require undoing a large number of changes or significant reasoning with the
A Content-Driven Reputation System for the Wikipedia
"... On-line forums for the collaborative creation of bodies of information are a phenomenon of rising importance; the Wikipedia is one of the best-known examples. The open nature of such forums could benefit from a notion of reputation for its authors. Author reputation could be used to flag new contrib ..."
Abstract
-
Cited by 66 (7 self)
- Add to MetaCart
On-line forums for the collaborative creation of bodies of information are a phenomenon of rising importance; the Wikipedia is one of the best-known examples. The open nature of such forums could benefit from a notion of reputation for its authors. Author reputation could be used to flag new contributions from low-reputation authors, and it could be used to allow only authors with good reputation to contribute to controversial or critical pages. A reputation system for the Wikipedia would also provide an incentive to give high-quality contributions. We present in this paper a novel type of contentdriven reputation system for Wikipedia authors. In our system, authors gain reputation when the edits and text additions they perform to Wikipedia articles are longlived, and they lose reputation when their changes are undone in short order. We have implemented the proposed system, and we have used it to analyze the entire Italian and French Wikipedias, consisting of a total of 691,551 pages and 5,587,523 revisions. Our results show that our notion of reputation has good predictive value: changes performed by low-reputation authors have a significantly larger than average probability of having poor quality, and of being undone.
Evaluating Structural Similarity in XML Documents
, 2002
"... XML documents on the web are often found without DTDs, particularly when these documents have been created from legacy HTML. Yet having knowledge of the DTD can be valuable in querying and manipulating such documents. Recent work (cf. [10]) has given us a means to (re-)construct a DTD to describe th ..."
Abstract
-
Cited by 62 (0 self)
- Add to MetaCart
XML documents on the web are often found without DTDs, particularly when these documents have been created from legacy HTML. Yet having knowledge of the DTD can be valuable in querying and manipulating such documents. Recent work (cf. [10]) has given us a means to (re-)construct a DTD to describe the structure common to a given set of document instances. However, given a collection of documents with unknown DTDs, it may not be appropriate to construct a single DTD to describe every document in the collection. Instead, we would wish to partition the collection into smaller sets of "similar" documents, and then induce a separate DTD for each such set. It is this partitioning problem that we address in this paper. Given two
Change-Centric Management of Versions in an XML Warehouse
- In Proceedings of VLDB 2001
, 2001
"... We present a change-centric method to manage versions in a Web WareHouse of XML data. The starting points is a sequence of snapshots of XML documents we obtain from the web. By running a diff algorithm, we compute the changes between two consecutive versions. We then represent the sequence using a n ..."
Abstract
-
Cited by 60 (6 self)
- Add to MetaCart
We present a change-centric method to manage versions in a Web WareHouse of XML data. The starting points is a sequence of snapshots of XML documents we obtain from the web. By running a diff algorithm, we compute the changes between two consecutive versions. We then represent the sequence using a novel representation of changes based on completed deltas and persistent identifiers. We present the foundations of the logical representation and some aspects of the physical storage policy. The work presented here was developed in the context of the Xyleme project of massive XML warehouse for XML data from the Web. It has been implemented and tested. We briefly discuss the implementation.
A dynamic warehouse for XML data of the Web
- IEEE DATA ENGINEERING BULLETIN
, 2001
"... Xyleme is a dynamic warehouse for XML data of the Web supporting query evaluation, change control and data integration. We briefly present our motivations, the general architecture and some aspects of Xyleme. The project we describe here was completed at the end of 2000. A prototype ..."
Abstract
-
Cited by 53 (0 self)
- Add to MetaCart
Xyleme is a dynamic warehouse for XML data of the Web supporting query evaluation, change control and data integration. We briefly present our motivations, the general architecture and some aspects of Xyleme. The project we describe here was completed at the end of 2000. A prototype
Temporal Slicing in the Evaluation of XML Queries
- In VLDB
, 2003
"... As with relational data, XML data changes over time with the creation, modification, and deletion of XML documents. Expressing queries on timevarying (relational or XML) data is more difficult than writing queries on nontemporal data. In this paper, we present a temporal XML query language, ..."
Abstract
-
Cited by 34 (4 self)
- Add to MetaCart
As with relational data, XML data changes over time with the creation, modification, and deletion of XML documents. Expressing queries on timevarying (relational or XML) data is more difficult than writing queries on nontemporal data. In this paper, we present a temporal XML query language, XQuery, in which we add valid time support to XQuery by minimally extending the syntax and semantics of XQuery. We adopt a stratum approach which maps a XQuery query to a conventional XQuery. The paper focuses on how to perform this mapping, in particular, on mapping sequenced queries, which are by far the most challenging. The critical issue of supporting sequenced queries (in any query language) is time-slicing the input data while retaining period timestamping. Timestamps are distributed throughout an XML document, rather than uniformly in tuples, complicating the temporal slicing while also providing opportunities for optimization.
Tracking Changes During Ontology Evolution
- In Proceeding of the 3rd International Semantic Web Conference (ISWC2004
, 2004
"... Abstract. As ontology development becomes a collaborative process, developers face the problem of maintaining versions of ontologies akin to maintaining versions of software code or versions of documents in large projects. Traditional versioning systems enable users to compare versions, examine chan ..."
Abstract
-
Cited by 31 (1 self)
- Add to MetaCart
Abstract. As ontology development becomes a collaborative process, developers face the problem of maintaining versions of ontologies akin to maintaining versions of software code or versions of documents in large projects. Traditional versioning systems enable users to compare versions, examine changes, and accept or reject changes. However, while versioning systems treat software code and text documents as text files, a versioning system for ontologies must compare and present structural changes rather than changes in text representation of ontologies. In this paper, we present the PROMPTDIFF ontology-versioning environment, which address these challenges. PROMPTDIFF includes an efficient version-comparison algorithm that produces a structural diff between ontologies. The results are presented to the users through an intuitive user interface for analyzing the changes that enables users to view concepts and groups of concepts that were added, deleted, and moved, distinguished by their appearance and with direct access to additional information characterizing the change. The users can then act on the changes, accepting or rejecting them. We present results of a
Observing Transaction-time Semantics with TTXPath
- In WISE
"... Transaction time is the time of database transactions that create, modify, or destroy facts. It is used to record when facts exist in a database. Accounting for transaction time is essential to supporting audit queries that delve into past database states and differential queries that pinpoint dif ..."
Abstract
-
Cited by 28 (4 self)
- Add to MetaCart
Transaction time is the time of database transactions that create, modify, or destroy facts. It is used to record when facts exist in a database. Accounting for transaction time is essential to supporting audit queries that delve into past database states and differential queries that pinpoint differences between two states. In a web context, transaction time is a problematic concept because there are no transactions. Browsers and other consumers of web data observe snapshots of resources like XML documents but are rarely active participants in their creation or destruction. This paper presents the TTXPath data model and query language. TTXPath extends XPath with support for transaction time. XPath is a specification language for locations in an XML document. It serves as the basis for XML query languages like XSLT and the XML Query Algebra. XPath has no temporal semantics. To construct a TTXPath data model, snapshots of an XML document are obtained over time by an observer. The snapshots are then merged and transaction times are associated with each edge and node. The TTXPath query language extends XPath with a transaction-time axis to enable a query to access past or future states, and with constructs to extract and compare times. TTXPath maximally reuses XPath hence the changes needed to support transaction time are minimal and TTXPath is fully backwards-compatible with XPath. 1
A methodology for clustering XML documents by structure
- Information Systems
, 2006
"... The processing and management of XML data are popular research issues. However, operations based on the structure of XML data have not received strong attention. These operations involve, among others, the grouping of structurally similar XML documents. Such grouping results from the application of ..."
Abstract
-
Cited by 20 (0 self)
- Add to MetaCart
The processing and management of XML data are popular research issues. However, operations based on the structure of XML data have not received strong attention. These operations involve, among others, the grouping of structurally similar XML documents. Such grouping results from the application of clustering methods with distances that estimate the similarity between tree structures. This paper presents a framework for clustering XML documents by structure. Modeling the XML documents as rooted ordered labeled trees, we study the usage of structural distance metrics in hierarchical clustering algorithms to detect groups of structurally similar XML documents. We suggest the usage of structural summaries for trees to improve the performance of the distance calculation and at the same time to maintain or even improve its quality. Our approach is tested using a prototype testbed.
Formal Verification of E-Services and Workflows
- Proc. ESSW
, 2002
"... Abstract. We study the verification problem for e-service (and workflow) specifications, aiming at efficient techniques for guiding the construction of composite e-services to guarantee desired properties (e.g., deadlock avoidance, bounds on resource usage, response times). Based on previously propo ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
Abstract. We study the verification problem for e-service (and workflow) specifications, aiming at efficient techniques for guiding the construction of composite e-services to guarantee desired properties (e.g., deadlock avoidance, bounds on resource usage, response times). Based on previously proposed e-service frameworks such as AZTEC and e-FLow, decision flow language Vortex, and our early work on verifying Vortex specifications using model checking and infinite state verification tools, we introduce a very simple e-service model for our investigation of verification issues. We first show how three different model checking techniques are applied to verification of specifications in simple e-service model, where the number of processes is limited to a predetermined number. We then introduce pid quantified constraints, a new symbolic representation that can encode infinite system states, to verify systems with unbounded and dynamic process instantiations. We think that it is a versatile technique and more suitable for verification of e-service specifications. If this is combined with other techniques such as abstraction and widening, it is possible to solve a large category of interesting verification problems for e-services. 1

