Results 1  10
of
278
Opportunistic Data Structures with Applications
, 2000
"... In this paper we address the issue of compressing and indexing data. We devise a data structure whose space occupancy is a function of the entropy of the underlying data set. We call the data structure opportunistic since its space occupancy is decreased when the input is compressible and this space ..."
Abstract

Cited by 290 (11 self)
 Add to MetaCart
(Show Context)
In this paper we address the issue of compressing and indexing data. We devise a data structure whose space occupancy is a function of the entropy of the underlying data set. We call the data structure opportunistic since its space occupancy is decreased when the input is compressible and this space reduction is achieved at no significant slowdown in the query performance. More precisely, its space occupancy is optimal in an informationcontent sense because a text T [1, u] is stored using O(H k (T )) + o(1) bits per input symbol in the worst case, where H k (T ) is the kth order empirical entropy of T (the bound holds for any fixed k). Given an arbitrary string P [1; p], the opportunistic data structure allows to search for the occ occurrences of P in T in O(p + occ log u) time (for any fixed > 0). If data are uncompressible we achieve the best space bound currently known [12]; on compressible data our solution improves the succinct suffix array of [12] and the classical suffix tree and suffix array data structures either in space or in query time or both.
Trajectory Sampling for Direct Traffic Observation
, 2001
"... Traffic measurement is a critical component for the control and engineering of communication networks. We argue that traffic measurement should make it possible to obtain the spatial flow of traffic through the domain, i.e., the paths followed by packets between any ingress and egress point of the d ..."
Abstract

Cited by 250 (30 self)
 Add to MetaCart
(Show Context)
Traffic measurement is a critical component for the control and engineering of communication networks. We argue that traffic measurement should make it possible to obtain the spatial flow of traffic through the domain, i.e., the paths followed by packets between any ingress and egress point of the domain. Most resource allocation and capacity planning tasks can benefit from such information. Also, traffic measurements should be obtained without a routing model and without knowledge of network state. This allows the traffic measurement process to be resilient to network failures and state uncertainty. We propose a method that allows the direct inference of traffic flows through a domain by observing the trajectories of a subset of all packets traversing the network. The key advantages of the method are that (i) it does not rely on routing state, (ii) its implementation cost is small, and (iii) the measurement reporting traffic is modest and can be controlled precisely. The key idea of the method is to sample packets based on a hash function computed over the packet content. Using the same hash function will yield the same sample set of packets in the entire domain, and enables us to reconstruct packet trajectories. I.
Compressed suffix arrays and suffix trees with applications to text indexing and string matching
, 2005
"... The proliferation of online text, such as found on the World Wide Web and in online databases, motivates the need for spaceefficient text indexing methods that support fast string searching. We model this scenario as follows: Consider a text T consisting of n symbols drawn from a fixed alphabet Σ. ..."
Abstract

Cited by 239 (19 self)
 Add to MetaCart
The proliferation of online text, such as found on the World Wide Web and in online databases, motivates the need for spaceefficient text indexing methods that support fast string searching. We model this scenario as follows: Consider a text T consisting of n symbols drawn from a fixed alphabet Σ. The text T can be represented in n lg Σ  bits by encoding each symbol with lg Σ  bits. The goal is to support fast online queries for searching any string pattern P of m symbols, with T being fully scanned only once, namely, when the index is created at preprocessing time. The text indexing schemes published in the literature are greedy in terms of space usage: they require Ω(n lg n) additional bits of space in the worst case. For example, in the standard unit cost RAM, suffix trees and suffix arrays need Ω(n) memory words, each of Ω(lg n) bits. These indexes are larger than the text itself by a multiplicative factor of Ω(lg Σ  n), which is significant when Σ is of constant size, such as in ascii or unicode. On the other hand, these indexes support fast searching, either in O(m lg Σ) timeorinO(m +lgn) time, plus an outputsensitive cost O(occ) for listing the occ pattern occurrences. We present a new text index that is based upon compressed representations of suffix arrays and suffix trees. It achieves a fast O(m / lg Σ  n +lgɛ Σ  n) search time in the worst case, for any constant
Programming Parallel Algorithms
, 1996
"... In the past 20 years there has been treftlendous progress in developing and analyzing parallel algorithftls. Researchers have developed efficient parallel algorithms to solve most problems for which efficient sequential solutions are known. Although some ofthese algorithms are efficient only in a th ..."
Abstract

Cited by 238 (10 self)
 Add to MetaCart
(Show Context)
In the past 20 years there has been treftlendous progress in developing and analyzing parallel algorithftls. Researchers have developed efficient parallel algorithms to solve most problems for which efficient sequential solutions are known. Although some ofthese algorithms are efficient only in a theoretical framework, many are quite efficient in practice or have key ideas that have been used in efficient implementations. This research on parallel algorithms has not only improved our general understanding ofparallelism but in several cases has led to improvements in sequential algorithms. Unf:ortunately there has been less success in developing good languages f:or prograftlftling parallel algorithftls, particularly languages that are well suited for teaching and prototyping algorithms. There has been a large gap between languages
Generalized Search Trees for Database Systems
 IN PROC. 21 ST INTERNATIONAL CONFERENCE ON VLDB
, 1995
"... This paper introduces the Generalized Search Tree (GiST), an index structure supporting an extensible set of queries and data types. The GiST allows new data types to be indexed in a manner supporting queries natural to the types; this is in contrast to previous work on tree extensibility which only ..."
Abstract

Cited by 236 (18 self)
 Add to MetaCart
(Show Context)
This paper introduces the Generalized Search Tree (GiST), an index structure supporting an extensible set of queries and data types. The GiST allows new data types to be indexed in a manner supporting queries natural to the types; this is in contrast to previous work on tree extensibility which only supported the traditional set of equality and range predicates. In a single data structure, the GiST provides all the basic search tree logic required by a database system, thereby unifying disparate structures such as B+trees and Rtrees in a single piece of code, and opening the application of search trees to general extensibility. To illustrate the exibility of the GiST, we provide simple method implementations that allow it to behave like a B+tree, an Rtree, and an RDtree, a new index for data with setvalued attributes. We also present a preliminary performance analysis of RDtrees, which leads to discussion on the nature of tree indices and how they behave for various datasets.
Cuckoo hashing
 JOURNAL OF ALGORITHMS
, 2001
"... We present a simple dictionary with worst case constant lookup time, equaling the theoretical performance of the classic dynamic perfect hashing scheme of Dietzfelbinger et al. (Dynamic perfect hashing: Upper and lower bounds. SIAM J. Comput., 23(4):738–761, 1994). The space usage is similar to that ..."
Abstract

Cited by 199 (7 self)
 Add to MetaCart
We present a simple dictionary with worst case constant lookup time, equaling the theoretical performance of the classic dynamic perfect hashing scheme of Dietzfelbinger et al. (Dynamic perfect hashing: Upper and lower bounds. SIAM J. Comput., 23(4):738–761, 1994). The space usage is similar to that of binary search trees, i.e., three words per key on average. Besides being conceptually much simpler than previous dynamic dictionaries with worst case constant lookup time, our data structure is interesting in that it does not use perfect hashing, but rather a variant of open addressing where keys can be moved back in their probe sequences. An implementation inspired by our algorithm, but using weaker hash functions, is found to be quite practical. It is competitive with the best known dictionaries having an average case (but no nontrivial worst case) guarantee.
A Simple Algorithm for Nearest Neighbor Search in High Dimensions
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1997
"... Abstract—The problem of finding the closest point in highdimensional spaces is common in pattern recognition. Unfortunately, the complexity of most existing search algorithms, such as kd tree and Rtree, grows exponentially with dimension, making them impractical for dimensionality above 15. In ne ..."
Abstract

Cited by 154 (1 self)
 Add to MetaCart
Abstract—The problem of finding the closest point in highdimensional spaces is common in pattern recognition. Unfortunately, the complexity of most existing search algorithms, such as kd tree and Rtree, grows exponentially with dimension, making them impractical for dimensionality above 15. In nearly all applications, the closest point is of interest only if it lies within a userspecified distance e. We present a simple and practical algorithm to efficiently search for the nearest neighbor within Euclidean distance e. The use of projection search combined with a novel data structure dramatically improves performance in high dimensions. A complexity analysis is presented which helps to automatically determine e in structured problems. A comprehensive set of benchmarks clearly shows the superiority of the proposed algorithm for a variety of structured and unstructured search problems. Object recognition is demonstrated as an example application. The simplicity of the algorithm makes it possible to construct an inexpensive hardware search engine which can be 100 times faster than its software equivalent. A C++ implementation of our algorithm is available upon request to search@cs.columbia.edu/CAVE/.
GPUTeraSort: High Performance Graphics Coprocessor Sorting for Large Database Management
, 2006
"... We present a new algorithm, GPUTeraSort, to sort billionrecord widekey databases using a graphics processing unit (GPU) Our algorithm uses the data and task parallelism on the GPU to perform memoryintensive and computeintensive tasks while the CPU is used to perform I/O and resource management. We ..."
Abstract

Cited by 148 (10 self)
 Add to MetaCart
We present a new algorithm, GPUTeraSort, to sort billionrecord widekey databases using a graphics processing unit (GPU) Our algorithm uses the data and task parallelism on the GPU to perform memoryintensive and computeintensive tasks while the CPU is used to perform I/O and resource management. We therefore exploit both the highbandwidth GPU memory interface and the lowerbandwidth CPU main memory interface and achieve higher memory bandwidth than purely CPUbased algorithms. GPUTeraSort is a twophase task pipeline: (1) read disk, build keys, sort using the GPU, generate runs, write disk, and (2) read, merge, write. It also pipelines disk transfers and achieves nearpeak I/O performance. We have tested the performance of GPUTeraSort on billionrecord files using the standard Sort benchmark. In practice, a 3 GHz Pentium IV PC with $265 NVIDIA 7800 GT GPU is significantly faster than optimized CPUbased algorithms on much faster processors, sorting 60GB for a penny; the best reported PennySort priceperformance. These results suggest that a GPU coprocessor can significantly improve performance on large data processing tasks. 1.
Dynamic and efficient key management for access hierarchies
 In Proceedings of the ACM Conference on Computer and Communications Security
, 2005
"... Hierarchies arise in the context of access control whenever the user population can be modeled as a set of partially ordered classes (represented as a directed graph). A user with access privileges for a class obtains access to objects stored at that class and all descendant classes in the hierarchy ..."
Abstract

Cited by 119 (7 self)
 Add to MetaCart
(Show Context)
Hierarchies arise in the context of access control whenever the user population can be modeled as a set of partially ordered classes (represented as a directed graph). A user with access privileges for a class obtains access to objects stored at that class and all descendant classes in the hierarchy. The problem of key management for such hierarchies then consists of assigning a key to each class in the hierarchy so that keys for descendant classes can be obtained via efficient key derivation. We propose a solution to this problem with the following properties: (1) the space complexity of the public information is the same as that of storing the hierarchy; (2) the private information at a class consists of a single key associated with that class; (3) updates (i.e., revocations and additions) are handled locally in the hierarchy; (4) the scheme is provably secure against collusion; and (5) each node can derive the key of any of its descendant with a number of symmetrickey operations bounded by the length of the path between the nodes. Whereas many previous schemes had some of these properties, ours is the first that satisfies all of them. The security of our scheme is based on pseudorandom functions, without reliance on the Random Oracle Model. 18 Portions of this work were supported by Grants IIS0325345 and CNS06274488 from the
Adding Networks
, 2001
"... An adding network is a distributed data structure that supports a concurrent, lockfree, lowcontention implementation of a fetch&add counter; a counting network is an instance of an adding network that supports only fetch&increment. We present a lower bound showing that adding networks ha ..."
Abstract

Cited by 117 (33 self)
 Add to MetaCart
An adding network is a distributed data structure that supports a concurrent, lockfree, lowcontention implementation of a fetch&add counter; a counting network is an instance of an adding network that supports only fetch&increment. We present a lower bound showing that adding networks have inherently high latency. Any adding network powerful enough to support addition by at least two values a and b, where a > b > 0, has sequential executions in which each token traverses Ω(n/c) switching elements, where n is the number of concurrent processes, and c is a quantity we call oneshot contention; for a large class of switching networks and for conventional counting networks the oneshot contention is constant. On the contrary, counting networks have O(log n) latency [4,7]. This bound is tight. We present the first concurrent, lockfree, lowcontention networked data structure that supports arbitrary fetch&add operations.