Results 11  20
of
32
Private Search in the Real World
"... Encrypted search — performing queries on protected data — has been explored in the past; however, its inherent inefficiency has raised questions of practicality. Here, we focus on improving the performance and extending its functionality enough to make it practical. We do this by optimizing the syst ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Encrypted search — performing queries on protected data — has been explored in the past; however, its inherent inefficiency has raised questions of practicality. Here, we focus on improving the performance and extending its functionality enough to make it practical. We do this by optimizing the system, and by stepping back from the goal of achieving maximal privacy guarantees in an encrypted search scenario and consider efficiency and functionality as priorities. We design and analyze the privacy implications of two practical extensions applicable to any keywordbased private search system. We evaluate their efficiency by building them on top of a private search system, called SADS. Additionally, we improve SADS ’ performance, privacy guaranties and functionality. The extended SADS system offers improved efficiency parameters that meet practical usability requirements in a relaxed adversarial model. We present the experimental results and evaluate the performance of the system. We also demonstrate analytically that our scheme can meet the basic needs of a major hospital complex’s admissions records. Overall, we achieve performance comparable to a simply configured MySQL database system. 1.
An Improved Analysis of the Lossy Difference Aggregator
"... We provide a detailed analysis of the Lossy Difference Aggregator, a recently developed data structure for measuring latency in a router environment where packet losses can occur. Our analysis provides stronger performance bounds than those given originally, and leads us to a model for how to optimi ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
We provide a detailed analysis of the Lossy Difference Aggregator, a recently developed data structure for measuring latency in a router environment where packet losses can occur. Our analysis provides stronger performance bounds than those given originally, and leads us to a model for how to optimize the parameters for the data structure when the loss rate is not known in advance by using competitive analysis.
Sampleoptimal averagecase sparse fourier transform in two dimensions. Unpublished manuscript
, 2012
"... We present the first sampleoptimal sublinear time algorithms for the sparse Discrete Fourier Transform over a twodimensional √ n × √ n grid. Our algorithms are analyzed for average case signals. For signals whose spectrum is exactly sparse, our algorithms use O(k) samples and run in O(klogk) time ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
We present the first sampleoptimal sublinear time algorithms for the sparse Discrete Fourier Transform over a twodimensional √ n × √ n grid. Our algorithms are analyzed for average case signals. For signals whose spectrum is exactly sparse, our algorithms use O(k) samples and run in O(klogk) time, wherek is the expected sparsity of the signal. For signals whose spectrum is approximately sparse, our algorithm usesO(klogn) samples and runs in O(klog 2 n) time; the latter algorithm works for k = Θ ( √ n). The number of samples used by our algorithms matches the known lower bounds for the respective signal models. By a known reduction, our algorithms give similar results for the onedimensional sparse Discrete Fourier Transform whennis a power of a small composite number (e.g.,n = 6 t). 1
Software Defined Traffic Measurement with OpenSketch
"... Most network management tasks in softwaredefined networks (SDN) involve two stages: measurement and control. While many efforts have been focused on network control APIs for SDN, little attention goes into measurement. The key challenge of designing a new measurement API is to strike a careful bala ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Most network management tasks in softwaredefined networks (SDN) involve two stages: measurement and control. While many efforts have been focused on network control APIs for SDN, little attention goes into measurement. The key challenge of designing a new measurement API is to strike a careful balance between generality (supporting a wide variety of measurement tasks) and efficiency (enabling high link speed and low cost). We propose a software defined traffic measurement architecture OpenSketch, which separates the measurement data plane from the control plane. In the data plane, OpenSketch provides a simple threestage pipeline (hashing, filtering, and counting), which can be implemented with commodity switch components and support many measurement tasks. In the control plane, OpenSketch provides a measurement library that automatically configures the pipeline and allocates resources for different measurement tasks. Our evaluations of realworld packet traces, our prototype on NetFPGA, and the implementation of five measurement tasks on top of OpenSketch, demonstrate that OpenSketch is general, efficient and easily programmable. 1
Some Open Questions Related to Cuckoo Hashing
"... Abstract. The purpose of this brief note is to describe recent work in the area of cuckoo hashing, including a clear description of several open problems, with the hope of spurring further research. 1 ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Abstract. The purpose of this brief note is to describe recent work in the area of cuckoo hashing, including a clear description of several open problems, with the hope of spurring further research. 1
CacheOblivious Hashing
, 2010
"... The hash table, especially its external memory version, is one of the most important index structures in large databases. Assuming a truly random hash function, it is known that in a standard external hash table with block size b, searching for a particular key only takes expected average tq = 1+1/2 ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
The hash table, especially its external memory version, is one of the most important index structures in large databases. Assuming a truly random hash function, it is known that in a standard external hash table with block size b, searching for a particular key only takes expected average tq = 1+1/2 Ω(b) disk accesses for any load factor α bounded away from 1. However, such nearperfect performance is achieved only when b is known and the hash table is particularly tuned for working with such a blocking. In this paper we study if it is possible to build a cacheoblivious hash table that works well with any blocking. Such a hash table will automatically perform well across all levels of the memory hierarchy and does not need any hardwarespecific tuning, an important feature in autonomous databases. We first show that linear probing, a classical collision resolution strategy for hash tables, can be easily made cacheoblivious but it only achieves tq = 1 + O(α/b). Then we demonstrate that it is possible to obtain tq = 1 + 1/2 Ω(b), thus matching the cacheaware bound, if the following two conditions hold: (a) b is a power of 2; and (b) every block starts at a memory address divisible by b. Both conditions hold on a real machine, although they are not stated in the cacheoblivious model. Interestingly, we also show that neither condition is dispensable: if either of them is removed, the best obtainable bound is tq = 1 + O(α/b), which is exactly what linear probing achieves.
Externalmemory Multimaps
, 2011
"... Many data structures support dictionaries, also known as maps or associative arrays, which store and manage a set of keyvalue pairs. A multimap is generalization that allows multiple values to be associated with the same key. For example, the inverted file data structure that is used prevalently in ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Many data structures support dictionaries, also known as maps or associative arrays, which store and manage a set of keyvalue pairs. A multimap is generalization that allows multiple values to be associated with the same key. For example, the inverted file data structure that is used prevalently in the infrastructure supporting search engines is a type of multimap, where words are used as keys and document pointers are used as values. We study the multimap abstract data type and how it can be implemented efficiently online in external memory frameworks, with constant expected I/O performance. The key technique used to achieve our results is a combination of cuckoo hashing using buckets that hold multiple items with a multiqueue implementation to cope with varying numbers of values per key. Our externalmemory results are for the standard twolevel memory model.
Improved Concentration Bounds for CountSketch
"... We present a refined analysis of the classic CountSketch streaming heavy hitters algorithm [CCF02]. CountSketch uses O(k log n) linear measurements of a vector x ∈ R n to give an estimate ̂x of x. The standard analysis shows that this estimate ̂x satisfies ‖̂x−x ‖ 2 ∞ < ‖x [k] ‖ 2 2/k, where x ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
We present a refined analysis of the classic CountSketch streaming heavy hitters algorithm [CCF02]. CountSketch uses O(k log n) linear measurements of a vector x ∈ R n to give an estimate ̂x of x. The standard analysis shows that this estimate ̂x satisfies ‖̂x−x ‖ 2 ∞ < ‖x [k] ‖ 2 2/k, where x
Biff (Bloom Filter) Codes: Fast Error Correction for Large Data Sets
"... Abstract—Large data sets are increasingly common in cloud and virtualized environments. For example, transfers of multiple gigabytes are commonplace, as are replicated blocks of such sizes. There is a need for fast errorcorrection or data reconciliation in such settings even when the expected numbe ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Abstract—Large data sets are increasingly common in cloud and virtualized environments. For example, transfers of multiple gigabytes are commonplace, as are replicated blocks of such sizes. There is a need for fast errorcorrection or data reconciliation in such settings even when the expected number of errors is small. Motivated by such cloud reconciliation problems, we consider errorcorrection schemes designed for large data, after explaining why previous approaches appear unsuitable. We introduce Biff codes, which are based on Bloom filters and are designed for large data. For Biff codes with a message of length L and E errors, the encoding time is O(L), decoding time is O(L + E) and the space overhead is O(E). Biff codes are lowdensity paritycheck codes; they are similar to Tornado codes, but are designed for errors instead of erasures. Further, Biff codes are designed to be very simple, removing any explicit graph structures and based entirely on hash tables. We derive Biff codes by a simple reduction from a set reconciliation algorithm for a recently developed data structure, invertible Bloom lookup tables. While the underlying theory is extremely simple, what makes this code especially attractive is the ease with which it can be implemented and the speed of decoding. We present results from a prototype implementation that decodes messages of 1 million words with thousands of errors in well under a second. I.
HashBased Data Structures for Extreme Conditions
, 2008
"... This thesis is about the design and analysis of Bloom filter and multiple choice hash table variants for application settings with extreme resource requirements. We employ a very flexible methodology, combining theoretical, numerical, and empirical techniques to obtain constructions that are both an ..."
Abstract
 Add to MetaCart
This thesis is about the design and analysis of Bloom filter and multiple choice hash table variants for application settings with extreme resource requirements. We employ a very flexible methodology, combining theoretical, numerical, and empirical techniques to obtain constructions that are both analyzable and practical. First, we show that a wide class of Bloom filter variants can be effectively implemented using very easily computable combinations of only two fully random hash functions. From a theoretical perspective, these results show that Bloom filters and related data structures can often be substantially derandomized with essentially no loss in performance. From a practical perspective, this derandomization allows for a significant speedup in certain query intensive applications. The rest of this work focuses on designing spaceefficient, openaddressed, multiple choice hash tables for implementation in highperformance router hardware. Using multiple hash functions conserves space, but requires every hash table operation to consider multiple hash buckets, forcing a tradeoff between the slow speed of examining these buckets serially