Results 1 - 10
of
33
Metadata efficiency in versioning file systems
- Conference on File and Storage Technologies (San Francisco, CA, 31 March–02 April 2003
, 2003
"... Rights to individual papers remain with the author or the author's employer. Permission is granted for noncommercial reproduction of the work for educational or research purposes. This copyright notice must be included in the reproduced paper. USENIX acknowledges all trademarks herein. ..."
Abstract
-
Cited by 75 (11 self)
- Add to MetaCart
Rights to individual papers remain with the author or the author's employer. Permission is granted for noncommercial reproduction of the work for educational or research purposes. This copyright notice must be included in the reproduced paper. USENIX acknowledges all trademarks herein.
An Efficient Multiversion Access Structure
- IEEE Transactions on Knowledge and Data Engineering
, 1997
"... Abstract—An efficient multiversion access structure for a transaction-time database is presented. Our method requires optimal storage and query times for several important queries and logarithmic update times. Three version operations}inserts, updates, and deletes}are allowed on the current database ..."
Abstract
-
Cited by 61 (0 self)
- Add to MetaCart
Abstract—An efficient multiversion access structure for a transaction-time database is presented. Our method requires optimal storage and query times for several important queries and logarithmic update times. Three version operations}inserts, updates, and deletes}are allowed on the current database, while queries are allowed on any version, present or past. The following query operations are performed in optimal query time: key range search, key history search, and time range view. The key-range query retrieves all records having keys in a specified key range at a specified time; the key history query retrieves all records with a given key in a specified time range; and the time range view query retrieves all records that were current during a specified time interval. Special cases of these queries include the key search query, which retrieves a particular version of a record, and the snapshot query which reconstructs the database at some past time. To the best of our knowledge no previous multiversion access structure simultaneously supports all these query and version operations within these time and space bounds. The bounds on query operations are worst case per operation, while those for storage space and version operations are (worst-case) amortized over a sequence of version operations. Simulation results show that good storage utilization and query performance is obtained. Index Terms—Transaction-time database, multidimensional data, access methods, data structures, indexing, I/O complexity.
Efficient Structural Joins on Indexed XML Documents
- In VLDB
, 2002
"... Queries on XML documents typically combine selections on element contents, and, via path expressions, the structural relationships between tagged elements. Structural joins are used to find all pairs of elements satisfying the primitive structural relationships specified in the query, namely, ..."
Abstract
-
Cited by 60 (3 self)
- Add to MetaCart
Queries on XML documents typically combine selections on element contents, and, via path expressions, the structural relationships between tagged elements. Structural joins are used to find all pairs of elements satisfying the primitive structural relationships specified in the query, namely, parent child and ancestor descendant relationships. Efficient support for structural joins is thus the key to efficient implementations of XML queries. Recently proposed node numbering schemes enable the capturing of the XML document structure us- ing traditional indices (such as B+-trees or R-trees). This paper proposes efficient struc- tural join algorithms in the presence of tag indices. We first concentrate on using B+- trees and show how to expedite a structural join by avoiding collections of elements that do not participate in the join. We then intro- duce an enhancement (based on sibling pointers) that further improves performance. Such sibling pointers are easily implemented and dynamically maintainable. We also present a structural join algorithm that utilizes R-trees.
Efficient Indexing of Spatiotemporal Objects
, 2002
"... Spatiotemporal objects, i.e., objects which change their position and/or extent over time appear in many applications. In this paper we examine the problem of indexing large volumes of such data. Important in this environment is how the spatiotemporal objects move and/or change. We consider a rath ..."
Abstract
-
Cited by 54 (10 self)
- Add to MetaCart
Spatiotemporal objects, i.e., objects which change their position and/or extent over time appear in many applications. In this paper we examine the problem of indexing large volumes of such data. Important in this environment is how the spatiotemporal objects move and/or change. We consider a rather general case where object movements/changes are defined by combinations of polynomial functions. We further concentrate on "snapshot" as well as small "interval" queries as these are quite common when examining the history of the gathered data. The obvious approach that approximates each spatiotemporal object by an MBR and uses a traditional multidimensional access method to index them is inefficient. Objects that "live" for long time intervals have large MBRs which introduce a lot of empty space. Clustering long intervals has been dealt in temporal databases by the use of partially persistent indices. What differentiates this problem from traditional temporal indexing, is that objects are allowed to move/change during their lifetime. Better ways are thus needed to approximate general spatiotemporal objects. One obvious solution is to introduce artificial splits: the lifetime of a long-lived object is split into smaller consecutive pieces. This decreases the empty space but increases the number of indexed MBRs. We first give an optimal algorithm and a heuristic for splitting a given spatiotemporal object in a predefined number of pieces. Then, given an upper bound on the total number of possible splits, we present three algorithms that decide how the splits are distributed among all the objects so that the total empty space is minimized. The number of splits cannot be increased indefinitely since the extra objects will eventually affect query performance. Usi...
Efficient Management of Multiversion Documents by Object Referencing
, 2001
"... Traditional approaches to versioning semistructured information are edit-based, i.e., subsequent document versions are represented by using edit scripts. This paper proposes a reference-based version management scheme that preserves the logical structure of the evolving document through the use of o ..."
Abstract
-
Cited by 51 (12 self)
- Add to MetaCart
Traditional approaches to versioning semistructured information are edit-based, i.e., subsequent document versions are represented by using edit scripts. This paper proposes a reference-based version management scheme that preserves the logical structure of the evolving document through the use of object references. By preserving the document structure among versions the new scheme facilitates more efficient query support. In particular, we examine queries involving projections and selections on the document versions, as well as queries on the document evolution history. Moreover, we show that the proposed scheme provides an effective representation of multiversioned XML documents, both at the transport and exchange levels. In fact, with the reference-based scheme, a document's history can also be viewed and processed as yet another XML document. Furthermore, we demonstrate the effectiveness of the new scheme at the storage level. In particular, the scheme is enhanced with a usefulness-based page management policy that extends and adapts techniques used in transaction-time databases to ensure efficient clustering of information among versions. An extensive comparison of the reference-based versioning against representations used in temporal databases and persistent object managers depicts the performance advantages of the new approach. Finally it should be noted that reference-based versioning is applicable to other kinds of semistructured information (besides XML documents), and can be used to replace traditional version control schemes, such as the edit-based RCS and the timestamp-based SCCS.
Indexing Animated Objects Using Spatiotemporal Access Methods
- IEEE Transactions on Knowledge and Data Engineering
, 2001
"... AbstractÐWe present a new approach for indexing animated objects and efficiently answering queries about their position in time and space. In particular, we consider an animated movie as a spatiotemporal evolution. A movie is viewed as an ordered sequence of frames, where each frame is a 2D space oc ..."
Abstract
-
Cited by 45 (7 self)
- Add to MetaCart
AbstractÐWe present a new approach for indexing animated objects and efficiently answering queries about their position in time and space. In particular, we consider an animated movie as a spatiotemporal evolution. A movie is viewed as an ordered sequence of frames, where each frame is a 2D space occupied by the objects that appear in that frame. The queries of interest are range queries of the form, ªfind the objects that appear in area S between frames fi and fjº as well as nearest neighbor queries such as, ªfind the q nearest objects to a given position A between frames fi and fj.º The straightforward approach to index such objects considers the frame sequence as another dimension and uses a 3D access method (such as, an R-Tree or its variants). This, however, assigns long ªlifetimeº intervals to objects that appear through many consecutive frames. Long intervals are difficult to cluster efficiently in a 3D index. Instead, we propose to reduce the problem to a partial-persistence problem. Namely, we use a 2D access method that is made partially persistent. We show that this approach leads to faster query performance while still using storage proportional to the total number of changes in the frame evolution. What differentiates this problem from traditional temporal indexing approaches is that objects are allowed to move and/or change their extent continuously between frames. We present novel methods to approximate such object evolutions. We formulate an optimization problem for which we provide an optimal solution for the case where objects move linearly. Finally, we present an extensive experimental study of the proposed methods. While we concentrate on animated movies, our approach is general and can be applied to other spatiotemporal applications as well. Index TermsÐAccess methods, spatiotemporal databases, animated objects, multimedia. 1
Efficient Complex Query Support for Multiversion XML Documents
- In EDBT
, 2002
"... Managing multiple versions of XML documents represents a critical requirement for many applications. Also, there has been much recent interest in supporting complex queries on XML data (e.g., regular path expressions, structural projections, DIFF queries). In this paper, we examine the problem of su ..."
Abstract
-
Cited by 27 (6 self)
- Add to MetaCart
Managing multiple versions of XML documents represents a critical requirement for many applications. Also, there has been much recent interest in supporting complex queries on XML data (e.g., regular path expressions, structural projections, DIFF queries). In this paper, we examine the problem of supporting efficiently complex queries on multiversioned XML documents. Our approach relies on a scheme based on durable node numbers (DNNs) that preserve the order among the XML tree nodes and are invariant with respect to updates. Using the document's DNNs various complex queries are reduced to combinations partial versio retrieval queries. We examine three indexing schemes to efficiently evaluate partial version retrieval queries in this environment. A thorough performance analysis is then presented to reveal the advantages of each scheme.
Efficient Indexing for Constraint and Temporal Databases
- Proc. 6th Int. Conf. on Database Theory (ICDT), LNCS 1186
, 1997
"... . We examine new I/O-efficient techniques for indexing problems in constraint and temporal data models. We present algorithms for these problems that are considerably simpler than previous solutions. Our solutions are unique in the sense that they only use B + -trees rather than special-purpos ..."
Abstract
-
Cited by 26 (0 self)
- Add to MetaCart
. We examine new I/O-efficient techniques for indexing problems in constraint and temporal data models. We present algorithms for these problems that are considerably simpler than previous solutions. Our solutions are unique in the sense that they only use B + -trees rather than special-purpose data structures. Indexing for many general constraint data models can be reduced to interval intersection. We present a new algorithm for this problem using a query-time/space tradeoff, which achieves the optimal query time O(log B n + t=B) I/O's in linear space O(n=B) using B + -trees. (Here, n is the number of intervals, t the number of intervals in the output of a query, and B the disk block size.) It is easy to update this data structure, but small worst-case bounds do not seem possible. Previous approaches have achieved these bounds but are fairly complex and rely mostly on reducing the interval intersection problem to special cases of two-dimensional search. Some of them c...
Metadata Efficiency in a Comprehensive Versioning File System
- In Proceedings of USENIX Conference on File and Storage Technologies
, 2002
"... A comprehensive versioning file system creates and retains a new file version for every WRITE or other modification request. The resulting history of file modifications provides a detailed view to tools and administrators seeking to investigate a suspect system state. Conventional versioning systems ..."
Abstract
-
Cited by 21 (2 self)
- Add to MetaCart
A comprehensive versioning file system creates and retains a new file version for every WRITE or other modification request. The resulting history of file modifications provides a detailed view to tools and administrators seeking to investigate a suspect system state. Conventional versioning systems do not efficiently record the many prior versions that result. In particular, the versioned metadata they keep consumes almost as much space as the versioned data. This paper examines two space-efficient metadata structures for versioning file systems and describes their integration into the Comprehensive Versioning File System (CVFS). Journal-based metadata encodes each metadata version into a single journal entry; CVFS uses this structure for inodes and indirect blocks, reducing the associated space requirements by 80%. Multiversion b-trees extend the per-entry key with a timestamp and keep current and historical entries in a single tree; CVFS uses this structure for directories, reducing the associated space requirements by 99%. Experiments with CVFS verify that its current-version performance is similar to that of non-versioning file systems. Although access to historical versions is slower than conventional versioning systems, checkpointing is shown to mitigate this effect.
Design, implementation, and performance of the LHAM log-structured history data access method
- In Proceedings of the 24th VLDB Conference
, 1998
"... Numerous applications such as stock market or medical informa-tion systems require that both historical and current data be logical-ly integrated into a temporal database. The underlying access method must support different forms of “time-travel ” queries, the migration of old record versions onto i ..."
Abstract
-
Cited by 21 (0 self)
- Add to MetaCart
Numerous applications such as stock market or medical informa-tion systems require that both historical and current data be logical-ly integrated into a temporal database. The underlying access method must support different forms of “time-travel ” queries, the migration of old record versions onto inexpensive archive media, and high insert and update rates. This paper introduces a new ac-cess method for transaction-time temporal data, called the Log-structured History Data Access Method (LHAM) that meets these demands. The basic principle of LHAM is to partition the data into successive components based on the timestamps of the record ver-sions. Components are assigned to different levels of a storage hier-archy, and incoming data is continuously migrated through the hierarchy. The paper discusses the LHAM concepts, including concurrency control and recovery, our full-fledged LHAM imple-mentation, and experimental performance results based on this im-plementation. A detailed comparison with the TSB-tree, both ana-lytically and based on experiments with real implementations, shows that LHAM is highly superior in terms of insert performance while query performance is in almost all cases at least as good as for the TSB-tree; in many cases it is much better, I

