Results 1 
9 of
9
Compressed Bloom Filters
, 2001
"... A Bloom filter is a simple spaceefficient randomized data structure for representing a set in order to support membership queries. Although Bloom filters allow false positives, for many applications the space savings outweigh this drawback when the probability of an error is sufficiently low. We in ..."
Abstract

Cited by 193 (10 self)
 Add to MetaCart
A Bloom filter is a simple spaceefficient randomized data structure for representing a set in order to support membership queries. Although Bloom filters allow false positives, for many applications the space savings outweigh this drawback when the probability of an error is sufficiently low. We introduce compressed Bloom filters, which improve performance when the Bloom filter is passed as a message, and its transmission size is a limiting factor. For example, Bloom filters have been suggested as a means for sharing Web cache information. In this setting, proxies do not share the exact contents of their caches, but instead periodically broadcast Bloom filters representing their cache. By using compressed Bloom filters, proxies can reduce the number of bits broadcast, the false positive rate, and/or the amount of computation per lookup. The cost is the processing time for compression and decompression, which can use simple arithmetic coding, and more memory use at the proxies, which utilize the larger uncompressed form of the Bloom filter.
Towards Compressing Web Graphs
 In Proc. of the IEEE Data Compression Conference (DCC
, 2000
"... In this paper, we consider the problem of compressing graphs of the link structure of the World Wide Web. We provide efficient algorithms for such compression that are motivated by recently proposed random graph models for describing the Web. ..."
Abstract

Cited by 80 (1 self)
 Add to MetaCart
In this paper, we consider the problem of compressing graphs of the link structure of the World Wide Web. We provide efficient algorithms for such compression that are motivated by recently proposed random graph models for describing the Web.
The Scalable Hyperlink Store
 HT'09
, 2009
"... This paper describes the Scalable Hyperlink Store, a distributed inmemory “database ” for storing large portions of the web graph. SHS is an enabler for research on structural properties of the web graph as well as new linkbased ranking algorithms. Previous work on specialized hyperlink databases ..."
Abstract

Cited by 10 (5 self)
 Add to MetaCart
This paper describes the Scalable Hyperlink Store, a distributed inmemory “database ” for storing large portions of the web graph. SHS is an enabler for research on structural properties of the web graph as well as new linkbased ranking algorithms. Previous work on specialized hyperlink databases focused on finding efficient compression algorithms for web graphs. By contrast, this work focuses on the systems issues of building such a database. Specifically, it describes how to build a hyperlink database that is fast, scalable, faulttolerant, and incrementally updateable.
DACs: Bringing Direct Access to VariableLength Codes
, 2012
"... We present a new variablelength encoding scheme for sequences of integers, Directly Addressable Codes (DACs), which enables direct access to any element of the encoded sequence without the need of any sampling method. Our proposal is a kind of implicit data structure that introduces synchronism in ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
We present a new variablelength encoding scheme for sequences of integers, Directly Addressable Codes (DACs), which enables direct access to any element of the encoded sequence without the need of any sampling method. Our proposal is a kind of implicit data structure that introduces synchronism in the encoded sequence without using asymptotically any extra space. We show some experiments demonstrating that the technique is not only simple, but also competitive in time and space with existing solutions in several applications, such as the representation of LCP arrays or highorder entropycompressed sequences.
On the Hardness of Finding Optimal Multiple Preset Dictionaries
"... Abstract—We show that the following simple compression problem is NPhard: given a collection of documents, find the pair of Huffman dictionaries that minimizes the total compressed size of the collection, where the best dictionary from the pair is used to compress each document. We also show the NP ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Abstract—We show that the following simple compression problem is NPhard: given a collection of documents, find the pair of Huffman dictionaries that minimizes the total compressed size of the collection, where the best dictionary from the pair is used to compress each document. We also show the NPhardness of finding optimal multiple preset dictionaries for LZ’77based compression schemes. Our reductions make use of the catalog segmentation problem, a natural partitioning problem. Our results justify heuristic attacks used in practice. Index Terms—Huffman coding, LZ’77, NPcompleteness, preset dictionaries, twostage compression.
Www.elsevier.com/locate/jvlc
"... Contentbasedinte retritb (CBIR)i achallengiB task. Current research works attempt to obtai and use thesemantiq ofiOzk to perform betterretriBzww Towardsthi goal, segmentatic of anicOB icO regijw has been usedi recent years,sirs localproperti; of regiOB can helpmatchiq objects betweenitwee and ther ..."
Abstract
 Add to MetaCart
Contentbasedinte retritb (CBIR)i achallengiB task. Current research works attempt to obtai and use thesemantiq ofiOzk to perform betterretriBzww Towardsthi goal, segmentatic of anicOB icO regijw has been usedi recent years,sirs localproperti; of regiOB can helpmatchiq objects betweenitwee and therebycontriI;I towards a more effectiB CBIR. Thi paperipercOw on a CBIRtechniOzc called SNL(SriBwBj NasciBjcXB Li thatuticqIq theregiBqT propertiX of theiecBIO In SNL each ichci segmented and features iaturesc the color, shape,sip andspatiB posiBB of theobtaiT; regiT are extracted.Regiac are then comparedusia the icBkOOTcX regik matchiO (IRM)diM)cIj measure,whiu i s not ametriq whii prevents the use ofmetri access structures orfilteri; techni;cX based on thetriOkkB iOkkBqcXO We overcomethi iico byusiB MiBBj a true metri dirikO to compare segmentedigment Thi resultid approach, called SNL ; can be usedi conjunctiO wij afilteric technicX to reduce substantizcX the number oficIqB compared. Albei metri.czkBOc computatiTkqOc expensii We addressthi drawback,i approach, where we replace theexpensiq metri dirii i by the icjjqkcXOI orijqkc (nonmetri; IRMdicBBzIj We found that one canstiI make use of the samefilterik technicXO at the expense ofliBk; lossi retrijTq effectiOcXOBq Thus, the mai contrikBcXO ofthi paperi ; a veryeffectiq andhicBz efficiqz regiqzqcXOO ici retriqz techniqcX r 2002Elsevij SiO e Ltd. AllricOq reserved. *Correspondio author. Tel.: +17804925678; fax: +17804921071.
39 Distributions in text
, 2005
"... The frequency of words and other linguistic units plays a central role in all branches of corpus linguistics. Indeed, the use of frequency information distinguishes corpusbased methodology from other approaches to language. Thus, not surprisingly, the distribution of frequencies of words and combin ..."
Abstract
 Add to MetaCart
The frequency of words and other linguistic units plays a central role in all branches of corpus linguistics. Indeed, the use of frequency information distinguishes corpusbased methodology from other approaches to language. Thus, not surprisingly, the distribution of frequencies of words and combinations of
New Algorithms on Wavelet Trees and Applications to Information Retrieval 1
"... Wavelet trees are widely used in the representation of sequences, permutations, text collections, binary relations, discrete points, and other succinct data structures. We show, however, that this still falls short of exploiting all of the virtues of this versatile data structure. In particular we s ..."
Abstract
 Add to MetaCart
Wavelet trees are widely used in the representation of sequences, permutations, text collections, binary relations, discrete points, and other succinct data structures. We show, however, that this still falls short of exploiting all of the virtues of this versatile data structure. In particular we show how to use wavelet trees to solve fundamental algorithmic problems such as range quantile queries, range next value queries, and range intersection queries. We explore several applications of these queries in Information Retrieval, in particular document retrieval in hierarchical and temporal documents, and in the representation of inverted lists.
On Compressing Permutations and Adaptive Sorting ✩
"... We prove that, given a permutation π over [1..n] formed of nRuns sorted blocks of sizes given by the vector R = 〈r1,..., rnRuns〉, there exists a compressed data structure encoding π in n(1 + H(R)) = n + ∑nRuns i=1 ri n log2 ri n(1 + log2 nRuns) bits while supporting access to the values of π() and ..."
Abstract
 Add to MetaCart
We prove that, given a permutation π over [1..n] formed of nRuns sorted blocks of sizes given by the vector R = 〈r1,..., rnRuns〉, there exists a compressed data structure encoding π in n(1 + H(R)) = n + ∑nRuns i=1 ri n log2 ri n(1 + log2 nRuns) bits while supporting access to the values of π() and π−1 () in time O(log nRuns / log log n) in the worst case and O(H(R) / log log n) on average, when the argument is uniformly distributed over [1..n]. This data structure can be constructed in time O(n(1 + H(R))), which yields an improved adaptive sorting algorithm. Similar results on compressed data structures for permutations and adaptive sorting algorithms are proved for other preorder measures of practical and theoretical interest. Keywords: sorting. Compression, permutations, succinct data structures, adaptive 1.