Results 1 
6 of
6
An Optimal Algorithm for the Distinct Elements Problem
"... We give the first optimal algorithm for estimating the number of distinct elements in a data stream, closing a long line of theoretical research on this problem begun by Flajolet and Martin in their seminal paper in FOCS 1983. This problem has applications to query optimization, Internet routing, ne ..."
Abstract

Cited by 68 (6 self)
 Add to MetaCart
(Show Context)
We give the first optimal algorithm for estimating the number of distinct elements in a data stream, closing a long line of theoretical research on this problem begun by Flajolet and Martin in their seminal paper in FOCS 1983. This problem has applications to query optimization, Internet routing, network topology, and data mining. For a stream of indices in {1,..., n}, our algorithm computes a (1 ± ε)approximation using an optimal O(ε −2 +log(n)) bits of space with 2/3 success probability, where 0 < ε < 1 is given. This probability can be amplified by independent repetition. Furthermore, our algorithm processes each stream update in O(1) worstcase time, and can report an estimate at any point midstream in O(1) worstcase time, thus settling both the space and time complexities simultaneously.
MORE HASTE, LESS WASTE: LOWERING THE REDUNDANCY IN FULLY INDEXABLE DICTIONARIES
, 2009
"... We consider the problem of representing, in a compressed format, a bitvector S of m bits with n 1s, supporting the following operations, where b ∈ {0,1}: • rankb(S, i) returns the number of occurrences of bit b in the prefix S [1..i]; • selectb(S, i) returns the position of the ith occurrence of bi ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
We consider the problem of representing, in a compressed format, a bitvector S of m bits with n 1s, supporting the following operations, where b ∈ {0,1}: • rankb(S, i) returns the number of occurrences of bit b in the prefix S [1..i]; • selectb(S, i) returns the position of the ith occurrence of bit b in S. Such a data structure is called fully indexable dictionary (fid) [Raman, Raman, and Rao, 2007], and is at least as powerful as predecessor data structures. Viewing S as a set X = {x1, x2,..., xn} of n distinct integers drawn from a universe [m] = {1,..., m}, the predecessor of integer y ∈ [m] in X is given by select1(S,rank1(S, y − 1)). fids have many applications in succinct and compressed data structures, as they are often involved in the construction of succinct representation for a variety of abstract data types. Our focus is on spaceefficient fids on the ram model with word size Θ(lg m) and constant time for all operations, so that the time cost is independent of the input size. Given the bitstring S to be encoded, having length m and containing n ones, the minimal amount of information that needs to be stored is B(n, m) = ⌈log ` ´ m ⌉. The n state of the art in building a fid for S is given in [Pǎtra¸scu, 2008] using B(m, n) + O(m/((log m/t) t)) + O(m 3/4) bits, to support the operations in O(t) time. Here, we propose a parametric data structure exhibiting a time/space tradeoff such that, for any real constants 0 < δ ≤ 1/2, 0 < ε ≤ 1, and integer s> 0, it uses
Fast compressed tries through path decompositions
 CoRR
"... Tries are popular data structures for storing a set of strings, where common prefixes are represented by common roottonode paths. More than 50 years of usage have produced many variants and implementations to overcome some of their limitations. We explore new succinct representations of pathdecom ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
(Show Context)
Tries are popular data structures for storing a set of strings, where common prefixes are represented by common roottonode paths. More than 50 years of usage have produced many variants and implementations to overcome some of their limitations. We explore new succinct representations of pathdecomposed tries and experimentally evaluate the corresponding reduction in space usage and memory latency, comparing with the state of the art. We study the following applications: compressed string dictionary andmonotone minimal perfect hash for strings. In compressed string dictionary, we obtain data structures that outperform other stateoftheart compressed dictionaries in space efficiency while obtaining predictable query times that are competitive with data structures preferred by the practitioners. On realworld datasets, our compressed tries obtain the smallest space (except for one case) and have the fastest lookup times, whereas access times are within 20 % slower than the bestknown solutions. In monotone minimal perfect hash for strings, our compressed tries perform several times faster than other triebased monotone perfect hash functions while occupying nearly the same space. On realworld datasets, our tries are approximately 2 to 5 times faster than previous solutions, with a space occupancy less than 10 % larger.
Sketching and Streaming HighDimensional Vectors
, 2011
"... A sketch of a dataset is a smallspace data structure supporting some prespecified set of queries (and possibly updates) while consuming space substantially sublinear in the space required to actually store all the data. Furthermore, it is often desirable, or required by the application, that the sk ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
A sketch of a dataset is a smallspace data structure supporting some prespecified set of queries (and possibly updates) while consuming space substantially sublinear in the space required to actually store all the data. Furthermore, it is often desirable, or required by the application, that the sketch itself be computable by a smallspace algorithm given just one pass over the data, a socalled streaming algorithm. Sketching and streaming have found numerous applications in network traffic monitoring, data mining, trend detection, sensor networks, and databases. In this thesis, I describe several new contributions in the area of sketching and streaming algorithms. • The first spaceoptimal streaming algorithm for the distinct elements problem. Our algorithm also achieves O(1) update and reporting times. • A streaming algorithm for Hamming norm estimation in the turnstile model which achieves the best known space complexity.
Externalmemory Multimaps
, 2011
"... Many data structures support dictionaries, also known as maps or associative arrays, which store and manage a set of keyvalue pairs. A multimap is generalization that allows multiple values to be associated with the same key. For example, the inverted file data structure that is used prevalently in ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Many data structures support dictionaries, also known as maps or associative arrays, which store and manage a set of keyvalue pairs. A multimap is generalization that allows multiple values to be associated with the same key. For example, the inverted file data structure that is used prevalently in the infrastructure supporting search engines is a type of multimap, where words are used as keys and document pointers are used as values. We study the multimap abstract data type and how it can be implemented efficiently online in external memory frameworks, with constant expected I/O performance. The key technique used to achieve our results is a combination of cuckoo hashing using buckets that hold multiple items with a multiqueue implementation to cope with varying numbers of values per key. Our externalmemory results are for the standard twolevel memory model.
and Sciences for Women
"... For many contemporary applications, such as distributed multimedia systems, rapid transmission of images is necessary. Cost of transmission and storage tends to be directly proportional to the volume of data. Therefore, application of digital image compression techniques becomes necessary to minimiz ..."
Abstract
 Add to MetaCart
(Show Context)
For many contemporary applications, such as distributed multimedia systems, rapid transmission of images is necessary. Cost of transmission and storage tends to be directly proportional to the volume of data. Therefore, application of digital image compression techniques becomes necessary to minimize the cost. A number of digital image compression algorithms have been developed and standardized. The method proposed by Joint Photographic Experts Group (JPEG) is a lossy compression technique. An improved version of JPEG is JPEG2000, which is currently the most popular compressor. The paper deals with a new compressing nonnegative integer method JPEG2000 with Gamma code compressors by modifying the JPEG2000 architecture. In the proposed methods, the entropy coder is replaced by new coders. Simulated experiments using the methods show that proposed methods gives better image quality when compared to the JPEG2000 at any given bit rate.