Results 1  10
of
19
Alphabet Partitioning for Compressed Rank/Select and Applications
"... Abstract. We present a data structure that stores a string s[1..n] over the alphabet [1..σ] in nH0(s) + o(n)(H0(s)+1) bits, where H0(s) is the zeroorder entropy of s. This data structure supports the queries access and rank in time O (lg lg σ), and the select query in constant time. This result imp ..."
Abstract

Cited by 18 (13 self)
 Add to MetaCart
Abstract. We present a data structure that stores a string s[1..n] over the alphabet [1..σ] in nH0(s) + o(n)(H0(s)+1) bits, where H0(s) is the zeroorder entropy of s. This data structure supports the queries access and rank in time O (lg lg σ), and the select query in constant time. This result improves on previously known data structures using nH0(s) + o(n lg σ) bits, where on highly compressible instances the redundancy o(n lg σ) cease to be negligible compared to the nH0(s) bits that encode the data. The technique is based on combining previous results through an ingenious partitioning of the alphabet, and practical enough to be implementable. It applies not only to strings, but also to several other compact data structures. For example, we achieve (i) faster search times and lower redundancy for the smallest existing fulltext selfindex; (ii) compressed permutations π with times for π() and π −1 () improved to loglogarithmic; and (iii) the first compressed representation of dynamic collections of disjoint sets. 1
Compact RichFunctional Binary Relation Representations
"... Abstract. Binary relations are an important abstraction arising in a number of data representation problems. Each existing data structure specializes in the few basic operations required by one single application, and takes only limited advantage of the inherent redundancy of binary relations. We sh ..."
Abstract

Cited by 12 (7 self)
 Add to MetaCart
Abstract. Binary relations are an important abstraction arising in a number of data representation problems. Each existing data structure specializes in the few basic operations required by one single application, and takes only limited advantage of the inherent redundancy of binary relations. We show how to support more general operations efficiently, while taking better advantage of some forms of redundancy in practical instances. As a basis for a more general discussion on binary relation data structures, we list the operations of potential interest for practical applications, and give reductions between operations. We identify a set of operations that yield the support of all others. As a first contribution to the discussion, we present two data structures for binary relations, each of which achieves a distinct tradeoff between the space used to store and index the relation, the set of operations supported in sublinear time, and the time in which those operations are supported. The experimental performance of our data structures shows that they not only offer good time complexities to carry out many operations, but also take advantage of regularities that arise in practical instances in order to reduce space usage. 1
Wavelet Trees for All
"... The wavelet tree is a versatile data structure that serves a number of purposes, from string processing to geometry. It can be regarded as a device that represents a sequence, a reordering, or a grid of points. In addition, its space adapts to various entropy measures of the data it encodes, enabli ..."
Abstract

Cited by 10 (5 self)
 Add to MetaCart
The wavelet tree is a versatile data structure that serves a number of purposes, from string processing to geometry. It can be regarded as a device that represents a sequence, a reordering, or a grid of points. In addition, its space adapts to various entropy measures of the data it encodes, enabling compressed representations. New competitive solutions to a number of problems, based on wavelet trees, are appearing every year. In this survey we give an overview of wavelet trees and the surprising number of applications in which we have found them useful: basic and weighted point grids, sets of rectangles, strings, permutations, binary relations, graphs, inverted indexes, document retrieval indexes, fulltext indexes, XML indexes, and general numeric sequences.
New lower and upper bounds for representing sequences
 CoRR
"... Abstract. Sequence representations supporting queries access, select and rank are at the core of many data structures. There is a considerable gap between different upper bounds, and the few lower bounds, known for such representations, and how they interact with the space used. In this article we p ..."
Abstract

Cited by 9 (8 self)
 Add to MetaCart
Abstract. Sequence representations supporting queries access, select and rank are at the core of many data structures. There is a considerable gap between different upper bounds, and the few lower bounds, known for such representations, and how they interact with the space used. In this article we prove a strong lower bound for rank, which holds for rather permissive assumptions on the space used, and give matching upper bounds that require only a compressed representation of the sequence. Within this compressed space, operations access and select can be solved within almostconstant time. 1
MPI on a Million Processors
"... Abstract. Petascale machines with close to a million processors will soon be available. Although MPI is the dominant programming model today, some researchers and users wonder (and perhaps even doubt) whether MPI will scale to such large processor counts. In this paper, we examine this issue of how ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
Abstract. Petascale machines with close to a million processors will soon be available. Although MPI is the dominant programming model today, some researchers and users wonder (and perhaps even doubt) whether MPI will scale to such large processor counts. In this paper, we examine this issue of how scalable is MPI. We first examine the MPI specification itself and discuss areas with scalability concerns and how they can be overcome. We then investigate issues that an MPI implementation must address to be scalable. We ran some experiments to measure MPI memory consumption at scale on up to 131,072 processes or 80 % of the IBM Blue Gene/P system at Argonne National Laboratory. Based on the results, we tuned the MPI implementation to reduce its memory footprint. We also discuss issues in application algorithmic scalability to large process counts and features of MPI that enable the use of other techniques to overcome scalability limitations in applications. 1
A Fun Application of Compact Data Structures to Indexing Geographic Data ⋆
"... Abstract. The way memory hierarchy has evolved in recent decades has opened new challenges in the development of indexing structures in general and spatial access methods in particular. In this paper we propose an original approach to represent geographic data based on compact data structures used i ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
Abstract. The way memory hierarchy has evolved in recent decades has opened new challenges in the development of indexing structures in general and spatial access methods in particular. In this paper we propose an original approach to represent geographic data based on compact data structures used in other fields such as text or image compression. A wavelet treebased structure allows us to represent minimum bounding rectangles solving geographic range queries in logarithmic time. A comparison with classical spatial indexes, such as the Rtree, shows that our structure can be considered as a fun, yet seriously competitive, alternative to these classical approaches. Key words: geographic data, MBR, range query, wavelet tree. 1
Optimal Dynamic Sequence Representations
"... We describe a data structure that supports access, rank and select queries, as well as symbol insertions and deletions, on a string S[1, n] over alphabet [1..σ] in time O(lg n / lg lg n), which is optimal. The time is worstcase for the queries and amortized for the updates. This complexity is bette ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
We describe a data structure that supports access, rank and select queries, as well as symbol insertions and deletions, on a string S[1, n] over alphabet [1..σ] in time O(lg n / lg lg n), which is optimal. The time is worstcase for the queries and amortized for the updates. This complexity is better than the best previous ones by a Θ(1 + lg σ / lg lg n) factor. Our structure uses nH0(S) + O(n + σ(lg σ + lg 1+ε n)) bits, where H0(S) is the zeroorder entropy of S and 0 < ε < 1 is any constant. This space redundancy over nH0(S) is also better, almost always, than that of the best previous dynamic structures, o(n lg σ)+O(σ(lg σ+lg n)). We can also handle general alphabets in optimal time, which has been an open problem in dynamic sequence representations.
A.: Scalability of communicators and groups in mpi
 In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
, 2010
"... As the number of cores inside compute clusters continues to grow, the scalability of MPI (Message Passing Interface) is important to ensure that programs can continue to execute on an everincreasing number of cores. One important scalability issue for MPI is the current implementation of communicat ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
As the number of cores inside compute clusters continues to grow, the scalability of MPI (Message Passing Interface) is important to ensure that programs can continue to execute on an everincreasing number of cores. One important scalability issue for MPI is the current implementation of communicators and groups. Communicators and groups are an integral part of MPI and play an essential role in the design and use of libraries. It is challenging to create an MPI implementation to support communicators and groups to scale to the hundreds of thousands of processes that are possible in today’s clusters. In this paper we present the design and evaluation of techniques to support the scalability of communicators and groups in MPI. We have designed and implemented a finegrain version of MPI, called FGMPI based on MPICH2, that allows thousands of fullfledged MPI processes inside an operating system process. Using FGMPI we can create hundreds and thousands of MPI processes, which allowed us to implement and evaluate solutions to the scalability issues associated with communicators. We describe techniques to allow for sharing of group information inside processes, which required a redefinition of the contextid and the design of scalable operations to create the communicators. A set plus permutation framework is introduced for storing group information that makes use of a variety of different representations. We also propose a set, instead of map, representation for MPI group objects that takes advantage of our framework. Performance results are given for the execution of a MPI benchmark program with upwards of 100,000 processes with communicators created for various groups of different sizes and types. 1.
LRMTrees: Compressed Indices, Adaptive Sorting, and Compressed Permutations ⋆
"... Abstract. LRMTrees are an elegant way to partition a sequence of values into sorted consecutive blocks, and to express the relative position of the first element of each block within a previous block. They were used to encode ordinal trees and to index integer arrays in order to support range minim ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Abstract. LRMTrees are an elegant way to partition a sequence of values into sorted consecutive blocks, and to express the relative position of the first element of each block within a previous block. They were used to encode ordinal trees and to index integer arrays in order to support range minimum queries on them. We describe how they yield many other convenient results in a variety of areas: compressed succinct indices for range minimum queries on partially sorted arrays; a new adaptive sorting algorithm; and a compressed succinct data structure for permutations supporting direct and inverse application in time inversely proportional to the permutation’s compressibility. 1
Efficient FullyCompressed Sequence Representations
, 2010
"... We present a data structure that stores a sequence s[1..n] over alphabet [1..σ] in nH0(s) + o(n)(H0(s)+1) bits, where H0(s) is the zeroorder entropy of s. This structure supports the queries access, rank and select, which are fundamental building blocks for many other compressed data structures, in ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
We present a data structure that stores a sequence s[1..n] over alphabet [1..σ] in nH0(s) + o(n)(H0(s)+1) bits, where H0(s) is the zeroorder entropy of s. This structure supports the queries access, rank and select, which are fundamental building blocks for many other compressed data structures, in worstcase time O (lg lg σ) and average time O (lg H0(s)). The worstcase complexity matches the best previous results, yet these had been achieved with data structures using nH0(s) + o(n lg σ) bits. On highly compressible sequences the o(n lg σ) bits of the redundancy may be significant compared to the the nH0(s) bits that encode the data. Our representation, instead, compresses the redundancy as well. Moreover, our averagecase complexity is unprecedented. Our technique is based on partitioning the alphabet into characters of similar frequency. The subsequence corresponding to each group can then be encoded using fast uncompressed representations without harming the overall compression ratios, even in the redundancy. The result also improves upon the best current compressed representations of several other data structures. For example, we achieve (i) compressed redundancy, retaining the best time complexities, for the smallest existing fulltext selfindexes; (ii) compressed permutations π with times for π() and π −1 () improved to loglogarithmic; and (iii) the first compressed representation of dynamic collections of disjoint sets. We also point out various applications to inverted indexes, suffix arrays, binary relations, and data compressors. Our structure is practical on large alphabets. Our experiments show that, as predicted by theory, it dominates the space/time tradeoff map of all the sequence representations, both in synthetic and application scenarios.