## Compact Data Structures with Fast Queries (2005)

### Cached

### Download Links

- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]

Citations: | 4 - 0 self |

### BibTeX

@TECHREPORT{Blandford05compactdata,

author = {Daniel K. Blandford and Christos Faloutsos and Danny Sleator},

title = {Compact Data Structures with Fast Queries},

institution = {},

year = {2005}

}

### OpenURL

### Abstract

Many applications dealing with large data structures can benefit from keeping them in compressed form. Compression has many benefits: it can allow a representation to fit in main memory rather than swapping out to disk, and it improves cache performance since it allows more data to fit into the cache. However, a data structure is only useful if it allows the application to perform fast queries (and updates) to the data.

### Citations

2621 | Normalized Cuts and Image Segmentation
- Shi, Malik
- 2000
(Show Context)
Citation Context ...s been used for many purposes, including VLSI layout [4], nested dissection for solving linear systems [80], partitioning graphs on to parallel processors [116], clustering [118], and computer vision =-=[112]-=-. Although finding a minimum separator for a graph is NP-hard, there are many algorithms that find good approximations [104]. Here we briefly review why graphs have good separators. One reason that ma... |

2165 | The pagerank citation ranking: Bringing order to the web
- Page, Brin, et al.
- 1999
(Show Context)
Citation Context ...orms some set operations to combine them into a result, and reports them to the user. It may be desirable to maintain the documents ordered, for example, by a ranking of the pages based on importance =-=[95]-=-. Using difference coding (as described in Section 2.4) these lists can be compressed into an array of bits using 5 or 6 bits per edge [136, 88, 12], but such representations are not well suited for m... |

1940 |
Collective dynamics of small-world networks
- Watts, Strogatz
- 1998
(Show Context)
Citation Context ...n 3-dimensions. Furthermore many graphs without pre-defined embeddings 1 This chapter is based on work done with Guy Blelloch and Ian Kash [15, 16]. 37sin low dimensional spaces have small separators =-=[132]-=-. For example, the link structure of the web has small separators, as our experiments show. In this chapter we are interested in compact representations of separable graphs (as described in Section 2.... |

808 | Managing Gigabytes: Compressing and Indexing Documents and Images
- WITTEN, MOFFAT, et al.
- 1999
(Show Context)
Citation Context ...ode by prepending to each codeword a number of zeroes equal to that codeword’s length minus one. This code is the gamma code [50]. The gamma code is only one of a wide class of prefix-free codes (see =-=[136]-=- for many others). For theoretical work this thesis will use gamma codes as they are easy to describe and conceptually easy to encode and decode. 8sDecoding gamma codes. Using a lookup table of size O... |

797 | A fast and high quality multilevel scheme for partitioning irregular graphs
- Karypis, Kumar
(Show Context)
Citation Context ...ex separators. For theoretical purposes we will assume the existence of a graph separator algorithm that returns a separator within the O(n c ) bound. For experimental purposes we find that the Metis =-=[71]-=- heuristic graph separator library works well. 12sChapter 3 Compact Dictionaries With Variable-Length Keys and Data 3.1 Introduction The dictionary problem is to maintain an n-element set of keys si w... |

677 |
Universal classes of hash functions
- Carter, Wegman
- 1979
(Show Context)
Citation Context ...preting it as a number. We denote this padded numerical representation of si by xi. We say a family H of hash functions onto 2 q elements is k-universal if for random h ∈ H, Pr(h(x1) = h(x2)) ≤ k/2 q =-=[32]-=-, and is k-pairwise independent if for random h ∈ H, Pr(h(x1) = y1 ∧ h(x2) = y2) ≤ k/2 2q for any x1 �= x2 in the domain, and y1,y2 in the range. We wish to construct hash functions h ′ ,h ′′ . The fu... |

637 | LEDA: a platform for combinatorial and geometric computing
- Mehlhorn, Näher
- 1999
(Show Context)
Citation Context ...f vertex i not j. Random insertion inserts the edges in random order. We compare the performance of our data structure to that of standard linked-list and array-based data structures, and to the LEDA =-=[84]-=- package. Since small differences in the implementation can make significant differences in performance, here we describe important details of these implementations. Adjacency lists. We use a singly l... |

455 |
Primitives for the manipulation of general subdivisions and the computation of voronoi diagrams
- Guibas, Stolfi
- 1985
(Show Context)
Citation Context ...urns in which of three orders it is held. There are many closely related data structures based on edges, including the doubly connected edge list [89], winged-edge [9], half-edge [133], and quad-edge =-=[59]-=- structures. In addition to triangulated meshes, these data structures can all be used for polygonal meshes. In these data structures each edge maintains pointers to its two neighboring vertices and t... |

422 |
Syntatic Clustering of the Web
- Broder, Glassman, et al.
- 1997
(Show Context)
Citation Context ...s among the neighbors of only those terms with less than a threshold number of neighbors τ. All other terms are simply deleted from the graph. (This technique is similar to that used by Broder et al. =-=[27]-=- for identifying near-duplicate web pages.) Pseudocode for this part of our algorithm is shown in Figure 6.1. Split-Index. Once BUILD-GRAPH has produced a similarity graph, the next step is to derive ... |

407 | Triangle: engineering a 2D quality mesh generator and Delaunay triangulator - Shewchuk - 1996 |

388 | A separator theorem for planar graphs
- Lipton, Tarjan
- 1979
(Show Context)
Citation Context ...orem if there are constants α < 1 and β > 0 such that every graph in S with n vertices has a cut set with at most βf(n) vertices that separates the graph into components with at most αn vertices each =-=[81]-=-. In this thesis we are particularly interested in the compression of classes of graphs for which f(n) is nc for some c < 1. One such class is the class of planar graphs, which satisfies a n 1 2 -sepa... |

347 |
Universal Codeword Sets and Representations of the Integers
- Elias
- 1975
(Show Context)
Citation Context ..., et cetera. It is possible to convert the binary code into a prefix-free code by prepending to each codeword a number of zeroes equal to that codeword’s length minus one. This code is the gamma code =-=[50]-=-. The gamma code is only one of a wide class of prefix-free codes (see [136] for many others). For theoretical work this thesis will use gamma codes as they are easy to describe and conceptually easy ... |

323 | Skip Lists: A Probabilistic Alternative to Balanced Trees
- Pugh
- 1990
(Show Context)
Citation Context ...[108] supports all these operations in the time listed in the expected case. Both of these can be made purely functional. As a third example, our representation using a skip-list dictionary structure =-=[100]-=- supports these operations in the same time bounds (expected case) but is not purely functional. 27s{306, 309, 312, 314, 315, 319} 306 3 3 2 1 4 0100110010 011 011 010 1 00100 Figure 4.1: The encoding... |

321 | External memory algorithms and data structures, in External Memory Algorithms and Visualization
- VITTER
- 1999
(Show Context)
Citation Context ...external memory. To avoid thrashing, this requires designing algorithms for which the access to the mesh is carefully orchestrated. Although several such external memory algorithms have been designed =-=[55, 45, 43, 83, 128, 7, 124, 5]-=-, these algorithms can be much more complicated than their main-memory counterparts, and can be significantly slower. The field of compressed meshes has received considerable attention [44, 61, 121, 9... |

318 | Geometry compression
- Deering
- 1998
(Show Context)
Citation Context ...128, 7, 124, 5], these algorithms can be much more complicated than their main-memory counterparts, and can be significantly slower. The field of compressed meshes has received considerable attention =-=[44, 61, 121, 98, 105, 120, 70, 66, 53]-=-. In three dimensions, for example, these methods can compress a tetrahedral mesh to less than a byte per tetrahedron [120]—about 6 bytes/vertex (not including vertex coordinates). These techniques, h... |

292 |
Partitioning of unstructured problems for parallel processing
- Simon
- 1991
(Show Context)
Citation Context ...d graphs. The separator property of graphs has been used for many purposes, including VLSI layout [4], nested dissection for solving linear systems [80], partitioning graphs on to parallel processors =-=[116]-=-, clustering [118], and computer vision [112]. Although finding a minimum separator for a graph is NP-hard, there are many algorithms that find good approximations [104]. Here we briefly review why gr... |

261 | Edgebreaker: Connectivity compression for triangle meshes
- Rossignac
- 1999
(Show Context)
Citation Context ...ompression is better for denser sets (as predicted by the space bound given above). Separable Graphs (Chapter 5). Recently there has been a great deal of interest in compact representations of graphs =-=[125, 72, 65, 82, 64, 105, 92, 68, 91, 40, 46, 65, 28, 1, 119, 22]-=-. Using difference coding 3sit is possible to create several different compact representations for separable graphs. (A graph is defined to be separable if it and all its subgraphs can be partitioned ... |

256 |
Sorting and Searching, volume 3 of The Art of Computer Programming
- Knuth
- 1998
(Show Context)
Citation Context ...g n) + O(n). Cleary [42] showed how to achieve (1 + ǫ)B + O(n) bits with O(1/ǫ2 ) expected time for lookup and insertion while allowing satellite data. His structure used the technique of quotienting =-=[74]-=-, which involves storing only part of each key in a hash bucket; the part not stored can be reconstructed using the index of the bucket containing the key. Brodnik and Munro [29] described a static st... |

249 | Geometric Compression through Topological Surgery
- Taubin, Rossignac
- 1998
(Show Context)
Citation Context ...128, 7, 124, 5], these algorithms can be much more complicated than their main-memory counterparts, and can be significantly slower. The field of compressed meshes has received considerable attention =-=[44, 61, 121, 98, 105, 120, 70, 66, 53]-=-. In three dimensions, for example, these methods can compress a tetrahedral mesh to less than a byte per tetrahedron [120]—about 6 bytes/vertex (not including vertex coordinates). These techniques, h... |

239 |
An approximate max-flow min-cut theorem for uniform multicommodity flow problems with applications to approximation algorithms
- Leighton, Rao
- 1988
(Show Context)
Citation Context ...he time needed to recursively separate the graph (all other aspects take linear time). A polylogarithmic approximation of the separator size is sufficient for our bounds so the Leighton-Rao separator =-=[78]-=- gives a polynomial time separator for graphs satisfying an O(n c ), c < 1 edge-separator theorem. For special graphs more efficient solutions are known, e.g., for planar graphs [81] and well shaped m... |

231 |
R.: A dichromatic framework for balanced trees
- Guibas, Sedgewick
- 1978
(Show Context)
Citation Context ...e structure. Section 4.6 gives experimental results for the second representation. To show the versatility of the compression technique, we applied it to two separate data structures: red-black trees =-=[60]-=- and functional treaps [6]. 4.2 Representation With Dictionaries Here we describe a representation for ordered sets based on our variable-bit-length dictionary from Section 3.3. We would like to repre... |

229 |
Run-length encodings
- Golomb
- 1966
(Show Context)
Citation Context ... 6.4 Experimentation Compression Techniques. We tested several common difference codes to see how much improvement our algorithm could provide. The codes we tested include the delta code, Golomb code =-=[54]-=-, and arithmetic code. These codes are described in more detail by Witten, Moffat, and Bell in [136]. We also tested the binary interpolative compression method of Moffat and Stuiver [87]. This code w... |

197 | Recent directions in netlist partitioning: A survey
- Alpert, Kahng
- 1995
(Show Context)
Citation Context ...ices. Along with sparsity, having good separators is probably the most universal property of real-world graphs. The separator property of graphs has been used for many purposes, including VLSI layout =-=[4]-=-, nested dissection for solving linear systems [80], partitioning graphs on to parallel processors [116], clustering [118], and computer vision [112]. Although finding a minimum separator for a graph ... |

195 | A Delaunay Refinement Algorithm for quality 2Dimensional Mesh Generation
- Ruppert
- 1995
(Show Context)
Citation Context .... Delaunay Refinement. To test our implementation’s performance for the case when new points are dynamically generated at runtime, we implemented a 2D Delaunay refinement code in the style of Ruppert =-=[106]-=-. We augment a Delaunay triangulation by adding circumcenters of badly shaped triangles while maintaining the Delaunay property. When the initial triangulation is built we walk through the mesh once a... |

192 | Succinct indexable dictionaries with applications to encoding k-ary trees and multisets
- Raman, Raman, et al.
(Show Context)
Citation Context ...sume the size of each string |si| ≥ 1, |ti| ≥ 1 for all bitstrings si and ti. There has been significant recent work involving data structures that use near optimal space while supporting fast access =-=[68, 91, 40, 29, 96, 57, 102, 51, 15, 103]-=-. The dictionary problem in particular has been well-studied in the case of fixed-length keys. The information-theoretic lower bound for representing n elements from a universe U is B = log⌈ � � |U| n... |

188 | Compressed suffix arrays and suffix trees with applications to text indexing and string matching
- Grossi, Vitter
- 2005
(Show Context)
Citation Context ...sume the size of each string |si| ≥ 1, |ti| ≥ 1 for all bitstrings si and ti. There has been significant recent work involving data structures that use near optimal space while supporting fast access =-=[68, 91, 40, 29, 96, 57, 102, 51, 15, 103]-=-. The dictionary problem in particular has been well-studied in the case of fixed-length keys. The information-theoretic lower bound for representing n elements from a universe U is B = log⌈ � � |U| n... |

186 |
Computing the n-dimensional Delaunay tessellation with applications to Voronoi polytopes
- Watson
- 1981
(Show Context)
Citation Context ...nd point distributions. We present experiments based on using our representation as part of incremental Delaunay algorithms in both 2D and 3D. We use a variant of the standard Bowyer-Watson algorithm =-=[25, 131]-=- and the exact arithmetic predicates of Shewchuk [111] for all geometric tests. We also present experiments based on a Delaunay refinement algorithm that removes triangles with small angles by adding ... |

182 | Cache-conscious structure layout
- Hill, D, et al.
- 1999
(Show Context)
Citation Context ...onds. a factor of 7.5 averaged over all the graphs (assuming the vertices are labeled with the separator ordering). The effect of insertion order has been previously reported (e.g. [84, page 268] and =-=[36]-=-) but the magnitude of the difference was surprising to us—the largest factor we have previously seen reported is about 4. We note that the magnitude is significantly less on the Pentium III with its ... |

182 |
Generalized nested dissection
- Lipton, Rose, et al.
- 1979
(Show Context)
Citation Context ...ne a graph to be separable if it is a member of a class that satisfies an nc-separator theorem. A class of graphs has bounded density if every n-vertex member has O(n) edges. Lipton, Rose, and Tarjan =-=[80]-=- prove that any class of graphs that satisfies a n/(log n) 1+ǫ -separator theorem with ǫ > 0 has bounded density. Hence separable graphs have bounded density. Another type of graph separator is an edg... |

180 | Spectral compression of mesh geometry
- Karni, Gotsman
- 2000
(Show Context)
Citation Context ...128, 7, 124, 5], these algorithms can be much more complicated than their main-memory counterparts, and can be significantly slower. The field of compressed meshes has received considerable attention =-=[44, 61, 121, 98, 105, 120, 70, 66, 53]-=-. In three dimensions, for example, these methods can compress a tetrahedral mesh to less than a byte per tetrahedron [120]—about 6 bytes/vertex (not including vertex coordinates). These techniques, h... |

176 | W.: Real time compression of triangle mesh connectivity
- GURNHOLD, STRASSER
- 1998
(Show Context)
Citation Context |

175 |
Geometry and Topology for Mesh Generation
- Edelsbrunner, H
- 2001
(Show Context)
Citation Context ...ent data structures for representing two and three dimensional simplicial meshes. (By a d simplicial mesh we mean a pure simplicial complex of dimension d, which is a manifold, possibly with boundary =-=[49]-=-.) The data structures support standard operations on meshes including traversing among neighboring simplices, inserting and deleting simplices, and the ability to store data on simplices. For a class... |

171 |
Computing Dirichlet tessellations
- Bowyer
- 1981
(Show Context)
Citation Context ...nd point distributions. We present experiments based on using our representation as part of incremental Delaunay algorithms in both 2D and 3D. We use a variant of the standard Bowyer-Watson algorithm =-=[25, 131]-=- and the exact arithmetic predicates of Shewchuk [111] for all geometric tests. We also present experiments based on a Delaunay refinement algorithm that removes triangles with small angles by adding ... |

170 |
Space-efficient static trees and graphs
- JACOBSON
- 1989
(Show Context)
Citation Context ...he cache. However, a data structure is only useful if it allows the application to perform fast queries (and updates) to the data. There has been considerable previous work on compact data structures =-=[68, 91, 29, 46]-=-. However, most of the previous work has been exclusively theoretical, in that the structures are too complex to implement or suffer from very high associated constant factors. Further, the compressio... |

163 | The webgraph framework i: Compression techniques
- Boldi, Vigna
- 2004
(Show Context)
Citation Context ...ompression is better for denser sets (as predicted by the space bound given above). Separable Graphs (Chapter 5). Recently there has been a great deal of interest in compact representations of graphs =-=[125, 72, 65, 82, 64, 105, 92, 68, 91, 40, 46, 65, 28, 1, 119, 22]-=-. Using difference coding 3sit is possible to create several different compact representations for separable graphs. (A graph is defined to be separable if it and all its subgraphs can be partitioned ... |

152 |
Applications of random sampling
- Clarkson, Shor
- 1989
(Show Context)
Citation Context ...h on the current mesh. When a point p is inserted, the cavity is determined by a search starting from the face that contained p. To achieve optimal runtime bounds we use the idea of Clarkson and Shor =-=[41]-=- and maintain an association of every point p not yet inserted into the mesh with the face tp that contains p. The search for the cavity of p will start at tp. Their algorithm keeps the history of the... |

140 |
A polyhedron representation for computer vision
- Baumgart
- 1975
(Show Context)
Citation Context ...turns the neighbor triangle, but returns in which of three orders it is held. There are many closely related data structures based on edges, including the doubly connected edge list [89], winged-edge =-=[9]-=-, half-edge [133], and quad-edge [59] structures. In addition to triangulated meshes, these data structures can all be used for polygonal meshes. In these data structures each edge maintains pointers ... |

140 | Succinct Representation of Balanced Parentheses and Static Trees
- Munro, Raman
(Show Context)
Citation Context ...he cache. However, a data structure is only useful if it allows the application to perform fast queries (and updates) to the data. There has been considerable previous work on compact data structures =-=[68, 91, 29, 46]-=-. However, most of the previous work has been exclusively theoretical, in that the structures are too complex to implement or suffer from very high associated constant factors. Further, the compressio... |

137 | Randomized Search Trees
- Aragon, Seidel
- 1989
(Show Context)
Citation Context ...ves experimental results for the second representation. To show the versatility of the compression technique, we applied it to two separate data structures: red-black trees [60] and functional treaps =-=[6]-=-. 4.2 Representation With Dictionaries Here we describe a representation for ordered sets based on our variable-bit-length dictionary from Section 3.3. We would like to represent ordered sets S of int... |

134 | Adaptive precision floating-point arithmetic and fast robust geometric predicates
- Shewchuk
- 1996
(Show Context)
Citation Context ...ing our representation as part of incremental Delaunay algorithms in both 2D and 3D. We use a variant of the standard Bowyer-Watson algorithm [25, 131] and the exact arithmetic predicates of Shewchuk =-=[111]-=- for all geometric tests. We also present experiments based on a Delaunay refinement algorithm that removes triangles with small angles by adding new points at their circumcenters. All space is report... |

128 | Dynamic perfect hashing: upper and lower bounds - Dietzfelbinger, Karlin, et al. - 1988 |

126 | The ISPD98 circuit benchmark suite
- Alpert
- 1998
(Show Context)
Citation Context ...ertex, it may be 51sMax Graph Vtxs Edges Degree Source auto 448695 6629222 37 3D mesh [130] feocean 143437 819186 6 3D mesh [130] m14b 214765 3358036 40 3D mesh [130] ibm17 185495 4471432 150 circuit =-=[3]-=- ibm18 210613 4443720 173 circuit [3] CA 1971281 5533214 12 street map [127] PA 1090920 3083796 9 street map [127] googleI 916428 5105039 6326 web links [56] googleO 916428 5105039 456 web links [56] ... |

125 | Cuckoo hashing - PAGH, RODLER - 2004 |

123 | Overview of the Eighth Text REtrieval Conference
- Voorhees, Harman
- 2000
(Show Context)
Citation Context ...nd therefore close together in the numbering. This is similar to the graph-reordering algorithm of Chapter 5. We have implemented this idea and tested it on indexing data from the TREC-8 ad hoc track =-=[129]-=- (disks 4 and 5, excluding the Congressional Record). We tested a variety of codes in combination with difference coding. Our algorithm was able to improve the performance of the best compression tech... |

121 | External-memory computational geometry
- Goodrich, Tsay, et al.
- 1993
(Show Context)
Citation Context ...external memory. To avoid thrashing, this requires designing algorithms for which the access to the mesh is carefully orchestrated. Although several such external memory algorithms have been designed =-=[55, 45, 43, 83, 128, 7, 124, 5]-=-, these algorithms can be much more complicated than their main-memory counterparts, and can be significantly slower. The field of compressed meshes has received considerable attention [44, 61, 121, 9... |

112 |
Computational Aspects of VLSI
- Ullman
- 1984
(Show Context)
Citation Context ...tely they have to be laid out in two dimensions with only a small constant number of layers of connections. It is well understood that the size of the layout depends critically on the separator sizes =-=[126]-=-. Clearly certain graphs do not have good separators. Expander graphs by their very definition cannot have small separators. 5.2 Static Representation We will consider three kinds of queries: degree q... |

97 |
List processing in real-time on a serial computer
- Baker
- 1978
(Show Context)
Citation Context ...on system used for this must be capable of allocating or freeing |s| bits of memory in time O(|s|/w), and may use O(|s|) space to keep track of each allocation. It is well known how to do this (e.g., =-=[8]-=-). Overview. We begin with an overview of our array structure. We partition the strings ai into blocks of contiguous elements, containing on average Θ(w) bits of data per block. We maintain the blocks... |

94 |
Representing geometric structures in d dimensions: topology and order
- Brisson
- 1989
(Show Context)
Citation Context ...acent faces rotating around its 3 edges, and 3 to the corner vertices). This corresponds to 18 pointers per tetrahedron. Weiler’s radial-edge representation [134], Brisson’s cell-tuple representation =-=[26]-=-, and Lienhardt’s G-map representation [79] all take more space. In summary, the most efficient standard data structures of simplicial meshes use 6 pointers per triangle in 2D and 8 pointers per tetra... |

85 |
Primitives for the manipulation of three–dimensional subdivisions
- Dobkin, Laszlo
- 1989
(Show Context)
Citation Context ...eps). Such boundary representations are more general than the tetrahedron data structures, allowing the representation of polytope meshes, but tend to take significantly more space. Dobkin and Laszlo =-=[48]-=- suggest a data structure based on edge-face pairs, which in general requires 6 pointers per edge-face. For tetrahedral meshes this data structure can be optimized to 9 pointers per face (6 to the adj... |

85 | Face Fixer: Compressing Polygon Meshes with
- SNOEYINK
- 2001
(Show Context)
Citation Context |