Results 1 - 10
of
72
Query evaluation techniques for large databases
- ACM COMPUTING SURVEYS
, 1993
"... Database management systems will continue to manage large data volumes. Thus, efficient algorithms for accessing and manipulating large sets and sequences will be required to provide acceptable performance. The advent of object-oriented and extensible database systems will not solve this problem. On ..."
Abstract
-
Cited by 592 (7 self)
- Add to MetaCart
Database management systems will continue to manage large data volumes. Thus, efficient algorithms for accessing and manipulating large sets and sequences will be required to provide acceptable performance. The advent of object-oriented and extensible database systems will not solve this problem. On the contrary, modern data models exacerbate it: In order to manipulate large sets of complex objects as efficiently as today’s database systems manipulate simple records, query processing algorithms and software will become more complex, and a solid understanding of algorithm and architectural issues is essential for the designer of database management software. This survey provides a foundation for the design and implementation of query execution facilities in new database management systems. It describes a wide array of practical query evaluation techniques for both relational and post-relational database systems, including iterative execution of complex query evaluation plans, the duality of sort- and hash-based set matching algorithms, types of parallel query execution and their implementation, and special operators for emerging database application domains.
Multidimensional Access Methods
, 1998
"... Search operations in databases require special support at the physical level. This is true for conventional databases as well as spatial databases, where typical search operations include the point query (find all objects that contain a given search point) and the region query (find all objects that ..."
Abstract
-
Cited by 508 (3 self)
- Add to MetaCart
Search operations in databases require special support at the physical level. This is true for conventional databases as well as spatial databases, where typical search operations include the point query (find all objects that contain a given search point) and the region query (find all objects that overlap a given search region). More
Query Optimization for XML
- In Proceedings of VLDB
, 1999
"... XML is an emerging standard for data representation and exchange on the World-Wide Web. Due to the nature of information on the Web and the inherent flexibility of XML, we expect that much of the data encoded in XML will be semistructured:the data may be irregular or incomplete, and its structu ..."
Abstract
-
Cited by 173 (2 self)
- Add to MetaCart
XML is an emerging standard for data representation and exchange on the World-Wide Web. Due to the nature of information on the Web and the inherent flexibility of XML, we expect that much of the data encoded in XML will be semistructured:the data may be irregular or incomplete, and its structure may change rapidly or unpredictably. This paper describes the query processor of Lore,aDBMS for XML-based data supporting an expressive query language. We focus primarily on Lore's cost-based query optimizer. While all of the usual problems associated with cost-based query optimization apply to XML-based query languages, a number of additional problems arise, such as new kinds of indexing, more complicated notions of database statistics, and vastly different query execution strategies for different databases. We define appropriate logical and physical query plans, database statistics, and a cost model, and we describe plan enumeration including heuristics for reducing the large search space. Our optimizer is fully implemented in Lore and preliminary performance results are reported.
Inclusion Of New Types In Relational Data Base Systems
, 1986
"... This paper explores a mechanism to support user-defined data types for columns in a relational data base system. Previous work suggested how to support new operators and new data types. The contribution of this work is to suggest ways to allow query optimization on commands which include new data ty ..."
Abstract
-
Cited by 75 (14 self)
- Add to MetaCart
This paper explores a mechanism to support user-defined data types for columns in a relational data base system. Previous work suggested how to support new operators and new data types. The contribution of this work is to suggest ways to allow query optimization on commands which include new data types and operators and ways to allow access methods to be used for new data types. 1. INTRODUCTION The collection of built-in data types in a data base system (e.g. integer, floating point number, character string) and built-in operators (e.g. +, -, *, /) were motivated by the needs of business data processing applications. However, in many engineering applications this collection of types is not appropriate. For example, in a geographic application a user typically wants points, lines, line groups and polygons as basic data types and operators which include intersection, distance and containment. In scientific application, one requires complex numbers and time series with appropriate operat...
A Polymorphic Record Calculus and Its Compilation
- ACM Transactions on Programming Languages and Systems
, 1995
"... this article appeared in Proceedings of ACM Symposium on Principles of Programming Languages, 1992, under the title \A compilation method for ML-style polymorphic record calculi." This work was partly supported by the Japanese Ministry of Education under scienti c research grant no. 06680319. Author ..."
Abstract
-
Cited by 67 (8 self)
- Add to MetaCart
this article appeared in Proceedings of ACM Symposium on Principles of Programming Languages, 1992, under the title \A compilation method for ML-style polymorphic record calculi." This work was partly supported by the Japanese Ministry of Education under scienti c research grant no. 06680319. Author's address: Research Institute for Mathematical Sciences, Kyoto University, Sakyo-ku, Kyoto 606-01, JAPAN; email: ohori@kurims.kyoto-u.ac.jp Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of ACM. To copy otherwise, or to republish, requires a fee and/or speci c permission. c 1999 ACM 0164-0925/99/0100-0111 $00.75
Indexing Semistructured Data
, 1998
"... This paper describes techniques for building and exploiting indexes on semistructured data: data that may not have a fixed schema and that may be irregular or incomplete. We first present a general framework for indexing values in the presence of automatic type coercion. Then based on Lore, a DBMS ..."
Abstract
-
Cited by 46 (2 self)
- Add to MetaCart
This paper describes techniques for building and exploiting indexes on semistructured data: data that may not have a fixed schema and that may be irregular or incomplete. We first present a general framework for indexing values in the presence of automatic type coercion. Then based on Lore, a DBMS for semistructured data, we introduce four types of indexes and illustrate how they are used during query processing. Our techniques and indexing structures are fully implemented and integrated into the Lore prototype. 1 Introduction We call data that is irregular or that exhibits type and structural heterogeneity semistructured, since it may not conform to a rigid, predefined schema. Such data arises frequently on the Web, or when integrating information from heterogeneous sources. In general, semistructured data can be neither stored nor queried in relational or object-oriented database management systems easily and efficiently. We are developing Lore 1 , a database management system d...
Dynamical Sources in Information Theory: A General Analysis of Trie Structures
- ALGORITHMICA
, 1999
"... Digital trees, also known as tries, are a general purpose flexible data structure that implements dictionaries built on sets of words. An analysis is given of three major representations of tries in the form of array-tries, list tries, and bst-tries ("ternary search tries"). The size and the sear ..."
Abstract
-
Cited by 43 (6 self)
- Add to MetaCart
Digital trees, also known as tries, are a general purpose flexible data structure that implements dictionaries built on sets of words. An analysis is given of three major representations of tries in the form of array-tries, list tries, and bst-tries ("ternary search tries"). The size and the search costs of the corresponding representations are analysed precisely in the average case, while a complete distributional analysis of height of tries is given. The unifying data model used is that of dynamical sources and it encompasses classical models like those of memoryless sources with independent symbols, of finite Markovchains, and of nonuniform densities. The probabilistic behaviour of the main parameters, namely size, path length, or height, appears to be determined by two intrinsic characteristics of the source: the entropy and the probability of letter coincidence. These characteristics are themselves related in a natural way to spectral properties of specific transfer operators of the Ruelle type.
The Log-Structured Merge-Tree (LSM-Tree)
, 1996
"... . High-performance transaction system applications typically insert rows in a History table to provide an activity trace; at the same time the transaction system generates log records for purposes of system recovery. Both types of generated information can benefit from efficient indexing. An example ..."
Abstract
-
Cited by 40 (3 self)
- Add to MetaCart
. High-performance transaction system applications typically insert rows in a History table to provide an activity trace; at the same time the transaction system generates log records for purposes of system recovery. Both types of generated information can benefit from efficient indexing. An example in a well-known setting is the TPC-A benchmark application, modified to support efficient queries on the History for account activity for specific accounts. This requires an index by account-id on the fast-growing History table. Unfortunately, standard disk-based index structures such as the B-tree will effectively double the I/O cost of the transaction to maintain an index such as this in real time, increasing the total system cost up to fifty percent. Clearly a method for maintaining a real-time index at low cost is desirable. The Log-Structured Merge-tree (LSM-tree) is a disk-based data structure designed to provide low-cost indexing for a file experiencing a high rate of record inserts ...
A New Hashing Package for UNIX
, 1991
"... UNIX support of disk oriented hashing was originally provided by dbm [ATT79] and subsequently improved upon in ndbm [BSD86]. In AT&T System V, in-memory hashed storage and access support was added in the hsearch library routines [ATT85]. The result is a system with two incompatible hashing schemes, ..."
Abstract
-
Cited by 28 (1 self)
- Add to MetaCart
UNIX support of disk oriented hashing was originally provided by dbm [ATT79] and subsequently improved upon in ndbm [BSD86]. In AT&T System V, in-memory hashed storage and access support was added in the hsearch library routines [ATT85]. The result is a system with two incompatible hashing schemes, each with its own set of shortcomings. This paper presents the design and performance characteristics of a new hashing package providing a superset of the functionality provided by dbm and hsearch. The new package uses linear hashing to provide efficient support of both memory based and disk based hash tables with performance superior to both dbm and hsearch under most conditions. Introduction Current UNIX systems offer two forms of hashed data access. Dbm and its derivatives provide keyed access to disk resident data while hsearch provides access for memory resident data. These two access methods are incompatible in that memory resident hash tables may not be stored on disk and disk resid...

