Results 1  10
of
190
PrivacyPreserving Access of Outsourced Data via Oblivious RAM Simulation. ArXiv eprints, April 2011. Eprint 1007.1259v2
"... Suppose a client, Alice, has outsourced her data to an external storage provider, Bob, because he has capacity for her massive data set, of size n, whereas her private storage is much smaller—say, of size O(n1/r), for some constant r> 1. Alice trusts Bob to maintain her data, but she would like t ..."
Abstract

Cited by 66 (9 self)
 Add to MetaCart
(Show Context)
Suppose a client, Alice, has outsourced her data to an external storage provider, Bob, because he has capacity for her massive data set, of size n, whereas her private storage is much smaller—say, of size O(n1/r), for some constant r> 1. Alice trusts Bob to maintain her data, but she would like to keep its contents private. She can encrypt her data, of course, but she also wishes to keep her access patterns hidden from Bob as well. We describe schemes for the oblivious RAM simulation problem with a small logarithmic or polylogarithmic amortized increase in access times, with a very high probability of success, while keeping the external storage to be of size O(n). To achieve this, our algorithmic contributions include a parallel MapReduce cuckoohashing algorithm and an externalmemory dataoblivious sorting algorithm.
An Improved Construction for Counting Bloom Filters
 14th Annual European Symposium on Algorithms, LNCS 4168
, 2006
"... Abstract. A counting Bloom filter (CBF) generalizes a Bloom filter data structure so as to allow membership queries on a set that can be changing dynamically via insertions and deletions. As with a Bloom filter, a CBF obtains space savings by allowing false positives. We provide a simple hashingbas ..."
Abstract

Cited by 65 (5 self)
 Add to MetaCart
(Show Context)
Abstract. A counting Bloom filter (CBF) generalizes a Bloom filter data structure so as to allow membership queries on a set that can be changing dynamically via insertions and deletions. As with a Bloom filter, a CBF obtains space savings by allowing false positives. We provide a simple hashingbased alternative based on dleft hashing called a dleft CBF (dlCBF). The dlCBF offers the same functionality as a CBF, but uses less space, generally saving a factor of two or more. We describe the construction of dlCBFs, provide an analysis, and demonstrate their effectiveness experimentally. 1
PrivacyPreserving Group Data Access via Stateless Oblivious RAM Simulation ∗
"... Motivated by cloud computing applications, we study the problem of providing privacypreserving access to an outsourced honestbutcurious data repository for a group of trusted users. We show how to achieve efficient privacypreserving data access using a combination of probabilistic encryption, wh ..."
Abstract

Cited by 61 (8 self)
 Add to MetaCart
(Show Context)
Motivated by cloud computing applications, we study the problem of providing privacypreserving access to an outsourced honestbutcurious data repository for a group of trusted users. We show how to achieve efficient privacypreserving data access using a combination of probabilistic encryption, which directly hides data values, and stateless oblivious RAM simulation, which hides the pattern of data accesses. We give a method with O(log n) amortized access overhead for simulating a RAM algorithm that has a memory of size n, using a scheme that is dataoblivious with very high probability. We assume that the simulation has access to a private workspace of size O(nν), for any given fixed constant ν> 0, but does not maintain state in between data access requests. Our simulation makes use of pseudorandom hash functions and is based on a novel hierarchy of cuckoo hash tables that all share a common stash. The method outperforms all previous techniques for stateless clients in terms of access overhead. We also provide experimental results from a prototype implementation of our scheme, showing its practicality. In addition, we show that one can eliminate the dependence on pseudorandom hash functions in our simulation while having the overhead rise to be O(log 2 n). 1
Space Efficient Hash Tables With Worst Case Constant Access Time
 In STACS
, 2003
"... We generalize Cuckoo Hashing [23] to dary Cuckoo Hashing and show how this yields a simple hash table data structure that stores n elements in (1 + ffl) n memory cells, for any constant ffl ? 0. Assuming uniform hashing, accessing or deleting table entries takes at most d = O(ln ffl ) probes ..."
Abstract

Cited by 60 (4 self)
 Add to MetaCart
(Show Context)
We generalize Cuckoo Hashing [23] to dary Cuckoo Hashing and show how this yields a simple hash table data structure that stores n elements in (1 + ffl) n memory cells, for any constant ffl ? 0. Assuming uniform hashing, accessing or deleting table entries takes at most d = O(ln ffl ) probes and the expected amortized insertion time is constant. This is the first dictionary that has worst case constant access time and expected constant update time, works with (1 + ffl) n space, and supports satellite information. Experiments indicate that d = 4 choices suffice for ffl 0:03. We also describe variants of the data structure that allow the use of hash functions that can be evaluted in constant time.
SILT: A MemoryEfficient, HighPerformance KeyValue Store
 In Proc. 23rd ACM SOSP, Cascias
, 2011
"... SILT (Small Index Large Table) is a memoryefficient, highperformance keyvalue store system based on flash storage that scales to serve billions of keyvalue items on a single node. It requires only 0.7 bytes of DRAM per entry and retrieves key/value pairs using on average 1.01 flash reads each. S ..."
Abstract

Cited by 53 (14 self)
 Add to MetaCart
(Show Context)
SILT (Small Index Large Table) is a memoryefficient, highperformance keyvalue store system based on flash storage that scales to serve billions of keyvalue items on a single node. It requires only 0.7 bytes of DRAM per entry and retrieves key/value pairs using on average 1.01 flash reads each. SILT combines new algorithmic and systems techniques to balance the use of memory, storage, and computation. Our contributions include: (1) the design of three basic keyvalue stores each with a different emphasis on memoryefficiency and writefriendliness; (2) synthesis of the basic keyvalue stores to build a SILT keyvalue store system; and (3) an analytical model for tuning system parameters carefully to meet the needs of different workloads. SILT requires one to two orders of magnitude less memory to provide comparable throughput to current highperformance keyvalue systems on a commodity desktop system with flash storage.
Oblivious RAM simulation with efficient worstcase access overhead
 In Proc. ACM Workshop on Cloud Computing Security (CCSW
, 2011
"... Oblivious RAM simulation is a method for achieving confidentiality and privacy in cloud computing environments. It involves obscuring the access patterns to a remote storage so that the manager of that storage cannot infer information about its contents. Existing solutions typically involve small am ..."
Abstract

Cited by 51 (5 self)
 Add to MetaCart
(Show Context)
Oblivious RAM simulation is a method for achieving confidentiality and privacy in cloud computing environments. It involves obscuring the access patterns to a remote storage so that the manager of that storage cannot infer information about its contents. Existing solutions typically involve small amortized overheads for achieving this goal, but nevertheless involve potentially huge variations in access times, depending on when they occur. In this paper, we show how to deamortize oblivious RAM simulations, so that each access takes a worstcase bounded amount of time. 1
Why simple hash functions work: Exploiting the entropy in a data stream
 In Proceedings of the 19th Annual ACMSIAM Symposium on Discrete Algorithms
, 2008
"... Hashing is fundamental to many algorithms and data structures widely used in practice. For theoretical analysis of hashing, there have been two main approaches. First, one can assume that the hash function is truly random, mapping each data item independently and uniformly to the range. This idealiz ..."
Abstract

Cited by 50 (9 self)
 Add to MetaCart
(Show Context)
Hashing is fundamental to many algorithms and data structures widely used in practice. For theoretical analysis of hashing, there have been two main approaches. First, one can assume that the hash function is truly random, mapping each data item independently and uniformly to the range. This idealized model is unrealistic because a truly random hash function requires an exponential number of bits to describe. Alternatively, one can provide rigorous bounds on performance when explicit families of hash functions are used, such as 2universal or O(1)wise independent families. For such families, performance guarantees are often noticeably weaker than for ideal hashing. In practice, however, it is commonly observed that weak hash functions, including 2universal hash functions, perform as predicted by the idealized analysis for truly random hash functions. In this paper, we try to explain this phenomenon. We demonstrate that the strong performance of universal hash functions in practice can arise naturally from a combination of the randomness of the hash function and the data. Specifically, following the large body of literature on random sources and randomness extraction, we model the data as coming from a “block source, ” whereby
ShoreMT: A Scalable Storage Manager for the Multicore Era
 EXTENDING DATABASE TECHNOLOGY (EDBT)
, 2009
"... Database storage managers have long been able to efficiently handle multiple concurrent requests. Until recently, however, a computer contained only a few singlecore CPUs, and therefore only a few transactions could simultaneously access the storage manager's internal structures. This allowed ..."
Abstract

Cited by 49 (14 self)
 Add to MetaCart
(Show Context)
Database storage managers have long been able to efficiently handle multiple concurrent requests. Until recently, however, a computer contained only a few singlecore CPUs, and therefore only a few transactions could simultaneously access the storage manager's internal structures. This allowed storage managers to use nonscalable approaches without any penalty. With the arrival of multicore chips, however, this situation is rapidly changing. More and more threads can run in parallel, stressing the internal scalability of the storage manager. Systems optimized for high performance at a limited number of cores are not assured similarly high performance at a higher core count, because unanticipated scalability obstacles arise. We benchmark four popular opensource storage managers (Shore, BerkeleyDB, MySQL, and PostgreSQL) on a modern multicore machine, and find that they all suffer in terms of scalability. We briefly examine the bottlenecks in the various storage engines. We then present ShoreMT, a multithreaded and highly scalable version of Shore which we developed by identifying and successively removing internal bottlenecks. When compared to other DBMS, ShoreMT exhibits superior scalability and 24 times higher absolute throughput than its peers. We also show that designers should favor scalability to singlethread performance, and highlight important principles for writing scalable storage engines, illustrated with real examples from the development of ShoreMT.
Implementing signatures for transactional memory
 40th Intl. Symp. on Microarchitecture
, 2007
"... Transactional Memory (TM) systems must track the read and write sets—items read and written during a transaction—to detect conflicts among concurrent transactions. Several TMs use signatures, which summarize unbounded read/write sets in bounded hardware at a performance cost of false positives (conf ..."
Abstract

Cited by 49 (7 self)
 Add to MetaCart
(Show Context)
Transactional Memory (TM) systems must track the read and write sets—items read and written during a transaction—to detect conflicts among concurrent transactions. Several TMs use signatures, which summarize unbounded read/write sets in bounded hardware at a performance cost of false positives (conflicts detected when none exists). This paper examines different organizations to achieve hardwareefficient and accurate TM signatures. First, we find that implementing each signature with a single khashfunction Bloom filter (True Bloom signature) is inefficient, as it requires multiported SRAMs. Instead, we advocate using k singlehashfunction Bloom filters in parallel (Parallel Bloom signature), using areaefficient singleported SRAMs. Our formal analysis shows that both organizations perform equally well in theory and our simulationbased evaluation shows this to hold approximately in practice. We also show that by choosing highquality hash functions we can achieve signature designs noticeably more accurate than the previously proposed implementations. Finally, we adapt Pagh and Rodler’s cuckoo hashing to implement CuckooBloom signatures. While this representation does not support set intersection, it mitigates false positives for the common case of small read/write sets and performs like a Bloom filter for large sets. 1.
Design and evaluation of main memory hash join algorithms for multicore CPUs
 In SIGMOD Conference
, 2011
"... The focus of this paper is on investigating efficient hash join algorithms for modern multicore processors in main memory environments. This paper dissects each internal phase of a typical hash join algorithm and considers different alternatives for implementing each phase, producing a family of ha ..."
Abstract

Cited by 48 (3 self)
 Add to MetaCart
(Show Context)
The focus of this paper is on investigating efficient hash join algorithms for modern multicore processors in main memory environments. This paper dissects each internal phase of a typical hash join algorithm and considers different alternatives for implementing each phase, producing a family of hash join algorithms. Then, we implement these main memory algorithms on two radically different modern multiprocessor systems, and carefully examine the factors that impact the performance of each method. Our analysis reveals some interesting results – a very simple hash join algorithm is very competitive to the other more complex methods. This simple join algorithm builds a shared hash table and does not partition the input relations. Its simplicity implies that it requires fewer parameter settings, thereby making it far easier for query optimizers and execution engines to use it in practice. Furthermore, the performance of this simple algorithm improves dramatically as the skew in the input data increases, and it quickly starts to outperform all other algorithms. Based on our results, we propose that database implementers consider adding this simple join algorithm to their repertoire of main memory join algorithms, or adapt their methods to mimic the strategy employed by this algorithm, especially when joining inputs with skewed data distributions.