Results 1 -
5 of
5
Making b + -trees cache conscious in main memory
- In Proceedings of the SIGMOD 2000 Conference
, 2000
"... Previous research has shown that cache behavior is important for main memory index structures. Cache conscious index structures such as Cache Sensitive Search Trees (CSS-Trees) perform lookups much faster than binary search and T-Trees. However, CSS-Trees are designed for decision support workloads ..."
Abstract
-
Cited by 45 (3 self)
- Add to MetaCart
Previous research has shown that cache behavior is important for main memory index structures. Cache conscious index structures such as Cache Sensitive Search Trees (CSS-Trees) perform lookups much faster than binary search and T-Trees. However, CSS-Trees are designed for decision support workloads with relatively static data. Although B +-Trees are more cache conscious than binary search and T-Trees, their utilization ofacachelineislowsincehalfofthespaceisused to store child pointers. Nevertheless, for applications that require incremental updates, traditional B +-Trees perform well. Our goal is to make B +-Trees as cache conscious as CSS-Trees without increasing their update cost too much. We propose a new indexing technique called “Cache Sensitive B +-Trees ” (CSB +-Trees). It is a variant of B +-Trees that stores all the child nodes of any given node contiguously, and keeps only the address of the first child in each node. The rest of the children can be found by adding an offset to that address. Since only one child pointer is stored explicitly, the utilization of a cache line is high. CSB +-Trees support incremental updates in a way similar to B +-Trees. We also introduce two variants of CSB +-Trees. Segmented CSB +-Trees divide the child nodes into segments. Nodes within the same segment are stored contiguously and only pointers to the beginning of each segment are stored explicitly in each node. Segmented CSB +-Trees can reduce the copying cost when there is a split since only one segment needs to be moved. Full
Indexing Values of Time Sequences
- IN PROCEEDINGS OF 5TH INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT
, 1996
"... A time sequence is a discrete sequence of values, e.g. temperature measurements, varying over time. Conventional indexes for time sequences are built on the time domain and cannot deal with inverse queries on a time sequence (i.e. computing the times when the values satisfy some conditions). To p ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
A time sequence is a discrete sequence of values, e.g. temperature measurements, varying over time. Conventional indexes for time sequences are built on the time domain and cannot deal with inverse queries on a time sequence (i.e. computing the times when the values satisfy some conditions). To process an inverse query the entire time sequence has to be scanned. This paper presents a dynamic indexing technique on the value domain for large time sequences which can be implemented using regular ordered indexing techniques (e.g. B-trees). Our index (termed IP-index) dramatically improves the query processing time of inverse queries compared to linear scanning. For periodic time sequences that have a limited range and precision on their value domain (most time sequences have this property), the IP-index has an upper bound for insertion time and search time.
A Value-Based Indexing Technique for Time Sequences
, 1997
"... A time sequence is a discrete sequence of values, e.g. temperature measurements, varying over time. Conventional indexes support efficient querying of explicit (stored) values of time sequences. In real life time sequences are often viewed as continuous and it is very important to support efficient ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
A time sequence is a discrete sequence of values, e.g. temperature measurements, varying over time. Conventional indexes support efficient querying of explicit (stored) values of time sequences. In real life time sequences are often viewed as continuous and it is very important to support efficient querying of implicit values (interpolated values). For inverse queries (i.e., computing the times when the values satisfy some conditions) concerning implicit values, the entire time sequence has to be scanned, which is unacceptable when the volume of the time sequence becomes large. This thesis presents a dynamic indexing technique, termed the IP-index (Interpolation -index) for large time sequences. This index supports efficient processing of inverse queries concerning implicit values. It can be implemented on top of regular ordered indexing techniques such as B-trees. Performance measurements show that this index dramatically improves the query processing time of inverse queries. For peri...
Management of 1-D Sequence Data -- From Discrete to Continuous
- Linköping University
, 1999
"... Data over ordered domains such as time or linear positions are termed sequence data. Sequence data require special treatments which are not provided by traditional DBMSs. Modelling sequence data in traditional (relational) database systems often results in awkward query expressions and bad performan ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Data over ordered domains such as time or linear positions are termed sequence data. Sequence data require special treatments which are not provided by traditional DBMSs. Modelling sequence data in traditional (relational) database systems often results in awkward query expressions and bad performance. For this reason, considerable research has been dedicated to supporting sequence data in DBMSs in the last decade. Unfortunately, some important requirements from applications are neglected, i.e., how to support sequence data viewed as continuous under user-defined interpolation assumptions, and how to perform subsequence extraction efficiently based on the conditions on the value domain. We term these kind of queries as value queries (in contrast to shape queries that look for general patterns of sequences). This thesis presents pioneering work on supporting value queries on 1-D sequence data based on arbitrary user-defined interpolation functions. An innovative indexing technique, ter...
Redesigning the String Hash Table, Burst Trie, and BST to Exploit Cache
, 2011
"... A key decision when developing in-memory computing applications is choice of a mechanism to store and retrieve strings. The most efficient current data structures for this task are the hash table with move-to-front chains and the burst trie, both of which use linked lists as a substructure, and vari ..."
Abstract
- Add to MetaCart
A key decision when developing in-memory computing applications is choice of a mechanism to store and retrieve strings. The most efficient current data structures for this task are the hash table with move-to-front chains and the burst trie, both of which use linked lists as a substructure, and variants of binary search tree. These data structures are computationally efficient, but typical implementations use large numbers of nodes and pointers to manage strings, which is not efficient in use of cache. In this article, we explore two alternatives to the standard representation: the simple expedient of including the string in its node, and, for linked lists, the more drastic step of replacing each list of nodes by a contiguous array of characters. Our experiments show that, for large sets of strings, the improvement is dramatic. For hashing, in the best case the total space overhead is reduced to less than 1 bit per string. For the burst trie, over 300MB of strings can be stored in a total of under 200MB of memory with significantly improved search time. These results, on a variety of data sets, show that cache-friendly variants of fundamental data structures can yield remarkable gains in performance.

