Results 1 - 10
of
36
The grid file: An adaptable, symmetric multikey file structure
- ACM Transactions on Database Systems
, 1984
"... Traditional file structures that provide multikey access to records, for example, inverted files, are extensions of file structures originally designed for single-key access. They manifest various deficien-cies in particular for multikey access to highly dynamic files. We study the dynamic aspects o ..."
Abstract
-
Cited by 362 (4 self)
- Add to MetaCart
Traditional file structures that provide multikey access to records, for example, inverted files, are extensions of file structures originally designed for single-key access. They manifest various deficien-cies in particular for multikey access to highly dynamic files. We study the dynamic aspects of tile structures that treat all keys symmetrically, that is, file structures which avoid the distinction between primary and secondary keys. We start from a bitmap approach and treat the problem of file design as one of data compression of a large sparse matrix. This leads to the notions of a grid partition of the search space and of a grid directory, which are the keys to a dynamic file structure called the grid file. This tile system adapts gracefully to its contents under insertions and deletions, and thus achieves an upper hound of two disk accesses for single record retrieval; it also handles range queries and partially specified queries efficiently. We discuss in detail the design decisions that led to the grid file, present simulation results of its behavior, and compare it to other multikey access file structures.
A Data Transformation System for Biological Data Sources
- In Proceedings of 21st International Conference on Very Large Data Bases
, 1995
"... Scientific data of importance to biologists in the Human Genome Project resides not only in conventional databases, but in structured files maintained in a number of different formats (e.g. ASN.1 and ACE) as well as sequence analysis packages (e.g. BLAST and FASTA). These formats and packages contai ..."
Abstract
-
Cited by 69 (19 self)
- Add to MetaCart
Scientific data of importance to biologists in the Human Genome Project resides not only in conventional databases, but in structured files maintained in a number of different formats (e.g. ASN.1 and ACE) as well as sequence analysis packages (e.g. BLAST and FASTA). These formats and packages contain a number of data types not found in conventional databases, such as lists and variants, and may be deeply nested. We present in this paper techniques for querying and transforming such data, and illustrate their use in a prototype system developed in conjunction with the Human Genome Center for Chromosome 22. We also describe optimizations performed by the system, a crucial issue for bulk data. 1 Introduction The goal of the Human Genome Project (HGP) is to sequence the 24 distinct chromosomes comprising the human genome. Much of the information associated with the HGP resides not in conventional databases, but in files that have been formatted according to a variety of conventions. These...
Dynamic Hashing Schemes
- ACM Computing Surveys
, 1988
"... A new type of dynamic file access called dynamic hushing has recently emerged. It promises the flexibility of handling dynamic tiles while preserving the fast access times expected from hashing. Such a fast, dynamic file access scheme is needed to support modern database systems. This paper surveys ..."
Abstract
-
Cited by 41 (1 self)
- Add to MetaCart
A new type of dynamic file access called dynamic hushing has recently emerged. It promises the flexibility of handling dynamic tiles while preserving the fast access times expected from hashing. Such a fast, dynamic file access scheme is needed to support modern database systems. This paper surveys dynamic hashing schemes and examines
Adaptable Pointer Swizzling Strategies in Object Bases: Design, Realization, and Quantitative Analysis
, 1993
"... In this paper, different approaches are classified and evaluated for optimizing the access to main-memory resident persistent objects---techniques which are commonly referred to as "pointer swizzling ". To speed up the access along inter-object references, the persistent pointers in the form of uniq ..."
Abstract
-
Cited by 32 (3 self)
- Add to MetaCart
In this paper, different approaches are classified and evaluated for optimizing the access to main-memory resident persistent objects---techniques which are commonly referred to as "pointer swizzling ". To speed up the access along inter-object references, the persistent pointers in the form of unique object identifiers (OIDs) are transformed (swizzled) into main-memory pointers (addresses). Pointer swizzling techniques can be directed into two classes: (1) strategies that allow replacement of swizzled objects from the buffer before the end of an application program and (2) those that outrule the displacement of swizzled objects. Whereas the latter class of pointer swizzling methods has received much attention in recent literature, the first class---i.e., techniques that take "precautions" for the replacement of swizzled objects---has not yet been thoroughly investigated. Four different pointer swizzling techniques allowing object replacement were investigated and contrasted with the p...
A Performance Evaluation of OID Mapping Techniques
, 1995
"... In this paper, three techniques to implement logical OIDs are thoroughly evaluated: hashing, B-trees and a technique called direct mapping. Among these three techniques, direct mapping is the most robust; it induces at most one page fault to map an OID, and it scales very well to large, rapidly grow ..."
Abstract
-
Cited by 31 (6 self)
- Add to MetaCart
In this paper, three techniques to implement logical OIDs are thoroughly evaluated: hashing, B-trees and a technique called direct mapping. Among these three techniques, direct mapping is the most robust; it induces at most one page fault to map an OID, and it scales very well to large, rapidly growing databases. Furthermore, the clustering of handles that are used to map logical OIDs is studied. In particular, the performance of B-trees and direct mapping can improve significantly if the handles of objects that are frequently accessed by the same methods are clustered. For direct mapping, two placement policies are compared: linear and matrix clustering. 1 Introduction The full support of object identity is one of the most important features of object-oriented database systems [KM94]. To improve referential integrity, an object base system allocates an object identifier (OID) to every object at the time the object is created. The OID is used to identify the object uniquely and to im...
Dynamic Maintenance of Data Distribution for Selectivity Estimation
- The VLDB Journal
, 1994
"... We propose a new dynamic method for multidimensional selectivity estimation for range queries that works accurately independent of data distribution. Good estimation of selectivity is important for query optimization and physical database design. Our method employs the Multilevel Grid File (MLGF) fo ..."
Abstract
-
Cited by 21 (9 self)
- Add to MetaCart
We propose a new dynamic method for multidimensional selectivity estimation for range queries that works accurately independent of data distribution. Good estimation of selectivity is important for query optimization and physical database design. Our method employs the Multilevel Grid File (MLGF) for accurate estimation of multidimensional data distribution. The MLGF is a dynamic hierarchical balanced multidimensional file structure that gracefully adapts to nonuniform and correlated distributions. We show that the MLGF directory naturally represents a multidimensional data distribution. We then extend it for further refinement and present the selectivity estimation method based on the MLGF. Extensive experiments have been performed to test the accuracy of selectivity estimation. The results show that estimation errors are very small independent of distributions even with correlated and/or highly-skewed ones. Finally, we analyze the cause of errors in estimation and investigate the eff...
Burst Tries: A Fast, Efficient Data Structure for String Keys
- ACM Transactions on Information Systems
, 2002
"... Many applications depend on efficient management of large sets of distinct strings in memory. For example, during index construction for text databases a record is held for each distinct word in the text, containing the word itself and information such as counters. We propose a new data structure, t ..."
Abstract
-
Cited by 21 (10 self)
- Add to MetaCart
Many applications depend on efficient management of large sets of distinct strings in memory. For example, during index construction for text databases a record is held for each distinct word in the text, containing the word itself and information such as counters. We propose a new data structure, the burst trie, that has significant advantages over existing options for such applications: it requires no more memory than a binary tree; it is as fast as a trie; and, while not as fast as a hash table, a burst trie maintains the strings in sorted or near-sorted order. In this paper we describe burst tries and explore the parameters that govern their performance. We experimentally determine good choices of parameters, and compare burst tries to other structures used for the same task, with a variety of data sets. These experiments show that the burst trie is particularly effective for the skewed frequency distributions common in text collections, and dramatically outperforms all other data structures for the task of managing strings while maintaining sort order.
The Architecture of the Dalí Main Memory Storage Manager
- Multimedia Tools and Applications
, 1997
"... ion Dal's architecture, illustrated in Figure 2, is organized in multiple layers of abstraction to support the toolkit approach discussed earlier. At the highest level, users can interact with Dal's relational manager. Below that level is what we call the heap-file/indexing layer, which provides su ..."
Abstract
-
Cited by 18 (6 self)
- Add to MetaCart
ion Dal's architecture, illustrated in Figure 2, is organized in multiple layers of abstraction to support the toolkit approach discussed earlier. At the highest level, users can interact with Dal's relational manager. Below that level is what we call the heap-file/indexing layer, which provides support for fixed-length and variable-length collections, as well as template-based Locks Logs System database Database file 1 Database file 2 Database file N Process 1 user code Dal library Virtual memory of process 1 Virtual memory of process 2 System physical memory Process 2 user code Dal library Log of updates to database Checkpoint images of database Figure 1. Architecture of the Dal system. Bell LabsTechnical Journal uWinter 1997 39 indexing abstractions. In general, at this level, a user does not need to interact with individual locks or latches. (A latchis a short-term lock implemented by a high-speed mutual exclusion mechanism.) Instead, the user specifi...
Modeling the Storage Architectures of Commercial Database Systems
- ACM Transactions on Database Systems
, 1985
"... Modeling the storage structures of a DBMS is a prerequisite to understanding and optimizing database performance. Previously, such modeling was very difficult because the fundamental role of conceptual-to-internal mappings in DBMS implementations went unrecognized. In this paper we present a model o ..."
Abstract
-
Cited by 17 (4 self)
- Add to MetaCart
Modeling the storage structures of a DBMS is a prerequisite to understanding and optimizing database performance. Previously, such modeling was very difficult because the fundamental role of conceptual-to-internal mappings in DBMS implementations went unrecognized. In this paper we present a model of physical databases, called the transformation model, that makes conceptual-to-internal mappings explicit. By exposing such mappings, we show that it is possible to model the storage architectures (i.e., the storage structures and mappings) of many commercial DBMSs in a precise, systematic, and comprehendible way. Models of the INQUIRE, ADABAS, and SYSTEM 2000 storage architectures are presented as examples of the model’s utility. We believe the transformation model helps bridge the gap between physical database theory and practice. It also reveals the possibility of a technology to automate the development of physical database software.

