Results 1 - 10
of
93
Transactional storage for geo-replicated systems
- In SOSP
, 2011
"... We describe the design and implementation of Walter, a key-value store that supports transactions and replicates data across distant sites. A key feature behind Walter is a new property called Parallel Snapshot Isolation (PSI). PSI allows Walter to replicate data asynchronously, while providing stro ..."
Abstract
-
Cited by 96 (4 self)
- Add to MetaCart
(Show Context)
We describe the design and implementation of Walter, a key-value store that supports transactions and replicates data across distant sites. A key feature behind Walter is a new property called Parallel Snapshot Isolation (PSI). PSI allows Walter to replicate data asynchronously, while providing strong guarantees within each site. PSI precludes write-write conflicts, so that developers need not worry about conflict-resolution logic. To prevent write-write conflicts and implement PSI, Walter uses two new and simple techniques: preferred sites and counting sets. We use Walter to build a social networking application and port a Twitter-like application.
Database replication using generalized snapshot isolation
- in Proceedings of IEEE Symposium on Reliable Distributed Systems (SRDS’05), 2005
"... Generalized snapshot isolation extends snapshot isola-tion as used in Oracle and other databases in a manner suit-able for replicated databases. While (conventional) snap-shot isolation requires that transactions observe the “lat-est ” snapshot of the database, generalized snapshot iso-lation allows ..."
Abstract
-
Cited by 86 (14 self)
- Add to MetaCart
(Show Context)
Generalized snapshot isolation extends snapshot isola-tion as used in Oracle and other databases in a manner suit-able for replicated databases. While (conventional) snap-shot isolation requires that transactions observe the “lat-est ” snapshot of the database, generalized snapshot iso-lation allows the use of “older ” snapshots, facilitating a replicated implementation. We show that many of the de-sirable properties of snapshot isolation remain. In particu-lar, read-only transactions never block or abort and they do not cause update transactions to block or abort. Moreover, under certain assumptions on the transaction workload the execution is serializable. An implementation of generalized snapshot isolation can choose which past snapshot it uses. An interesting choice for a replicated database is prefix-consistent snapshot isola-tion, in which the snapshot contains at least all the writes of locally committed transactions. We present two implemen-tations of prefix-consistent snapshot isolation. We conclude with an analytical performance model of one implementa-tion, demonstrating the benefits, in particular reduced la-tency for read-only transactions, and showing that the po-tential downsides, in particular change in abort rate of up-date transactions, are limited. 1.
Tashkent: Uniting Durability with Transaction Ordering for High-Performance Scalable Database Replication
- In EuroSys
, 2006
"... In stand-alone databases, the functions of ordering the transaction commits and making the effects of transactions durable are performed in one single action, namely the writing of the commit record to disk. For efficiency many of these writes are grouped into a single disk operation. In replicated ..."
Abstract
-
Cited by 49 (8 self)
- Add to MetaCart
(Show Context)
In stand-alone databases, the functions of ordering the transaction commits and making the effects of transactions durable are performed in one single action, namely the writing of the commit record to disk. For efficiency many of these writes are grouped into a single disk operation. In replicated databases in which all replicas agree on the commit order of update transactions, these two functions are typically separated. Specifically, the replication middleware determines the global commit order, while the database replicas make the transactions durable. The contribution of this paper is to demonstrate that this separation causes a significant scalability bottleneck. It forces some of the commit records to be written to disk serially, where in a standalone system they could have been grouped together in a single disk write. Two solutions are possible: (1) move durability from the database to the replication middleware, or (2) keep durability in the database and pass the global commit order from the replication middleware to the database. We implement these two solutions. Tashkent-MW is a pure middleware solution that combines durability and ordering in the middleware, and treats an unmodified database as a black box. In Tashkent-API, we modify the database API so that the middleware can specify the commit order to the database, thus, combining ordering and durability inside the database. We compare both Tashkent systems to an otherwise identical replicated system, called Base, in which ordering and durability remain separated. Under high update transaction loads both Tashkent systems greatly outperform Base in throughput and response time.
A Comparative Evaluation of Transparent Scaling Techniques for Dynamic Content Servers
- In ICDE
, 2005
"... We study several transparent techniques for scaling dynamic content web sites, and we evaluate their relative impact when used in combination. Full transparency implies strong data consistency as perceived by the user, no modifications to existing dynamic content site tiers and no additional program ..."
Abstract
-
Cited by 44 (4 self)
- Add to MetaCart
(Show Context)
We study several transparent techniques for scaling dynamic content web sites, and we evaluate their relative impact when used in combination. Full transparency implies strong data consistency as perceived by the user, no modifications to existing dynamic content site tiers and no additional programming effort from the user or site administrator upon deployment.
Tashkent+: Memory-aware load balancing and update filtering in replicated databases
- In EuroSys 2007: Proceedings of the 2nd European Conference on Computer Systems
, 2007
"... We present a memory-aware load balancing (MALB) technique to dispatch transactions to replicas in a replicated database. Our MALB algorithm exploits knowledge of the working sets of transactions to assign them to replicas in such a way that they execute in main memory, thereby reducing disk I/O. In ..."
Abstract
-
Cited by 37 (7 self)
- Add to MetaCart
We present a memory-aware load balancing (MALB) technique to dispatch transactions to replicas in a replicated database. Our MALB algorithm exploits knowledge of the working sets of transactions to assign them to replicas in such a way that they execute in main memory, thereby reducing disk I/O. In support of MALB, we introduce a method to estimate the size and the contents of transaction working sets. We also present an optimization called update filtering that reduces the overhead of update propagation between replicas. We show that MALB greatly improves performance over other load balancing techniques – such as round robin, least connections, and locality-aware request distribution (LARD) – that do not use explicit information on how transactions use memory. In particular, LARD demonstrates good performance for read-only static content Web workloads, but it gives performance inferior to MALB for database workloads as it does not efficiently handle large requests. MALB combined with update filtering further boosts performance over LARD. We build a prototype replicated system, called Tashkent+, with which we demonstrate that MALB and update filtering techniques improve performance of the TPC-W and RUBiS benchmarks. In particular, in a 16-replica cluster and using the ordering mix of TPC-W, MALB doubles the throughput over least connections and improves throughput 52 % over LARD. MALB with update filtering further improves throughput to triple that of least connections and more than double that of LARD. Our techniques exhibit super-linear speedup; the throughput of the 16-replica cluster is 37 times the peak throughput of a standalone database due to better use of the cluster’s memory.
Exploiting distributed version concurrency in a transactional memory cluster
- In PPOPP
, 2006
"... We investigate a transactional memory runtime system providing scaling and strong consistency for generic C++ and SQL applications on commodity clusters. We introduce a novel page-level distributed concurrency control algorithm, called Distributed Multiversioning (DMV). DMV automatically detects and ..."
Abstract
-
Cited by 32 (3 self)
- Add to MetaCart
(Show Context)
We investigate a transactional memory runtime system providing scaling and strong consistency for generic C++ and SQL applications on commodity clusters. We introduce a novel page-level distributed concurrency control algorithm, called Distributed Multiversioning (DMV). DMV automatically detects and resolves conflicts caused by data races for distributed transactions accessing shared in-memory data structures. DMV’s key novelty is in exploiting the distributed data versions that naturally occur in a replicated cluster in order to avoid read-write conflicts. Specifically, DMV runs conflicting transactions in parallel on different replicas, instead of using different physical data copies within a single node as in classic multiversioning. In its most general update-anywhere configuration, DMV can be used to implement a software transactional memory abstraction for classic distributed shared memory applications. DMV supports scaling for highly multithreaded database applications as well by centralizing updates on a master replica and creating the required page versions for read-only transactions on a set of slaves. In this DMV configuration, a version-aware scheduling technique distributes the read-only transactions across the slaves in such a way to minimize version conflicts. In our evaluation, we use DMV as a lightweight approach to scaling a hash table microbenchmark workload and the industrystandard e-commerce workload of the TPC-W benchmark on a commodity cluster. Our measurements show scaling for both benchmarks. In particular, we show near-linear scaling up to 8 transactional nodes for the most common e-commerce workload, the TPC-W shopping mix. We further show that our scaling for the TPC-W e-commerce benchmark compares favorably with that of an existing coarse-grained asynchronous replication technique.
Sprint: a middleware for high-performance transaction processing
- In EuroSys ’07: Proceedings of the ACM SIGOPS/EuroSys Eu Conference on Computer Systems 2007
, 2007
"... Sprint is a middleware infrastructure for high performance and high availability data management. It extends the functionality of a standalone in-memory database (IMDB) server to a cluster of commodity shared-nothing servers. Applications accessing an IMDB are typically limited by the memory capacit ..."
Abstract
-
Cited by 30 (3 self)
- Add to MetaCart
(Show Context)
Sprint is a middleware infrastructure for high performance and high availability data management. It extends the functionality of a standalone in-memory database (IMDB) server to a cluster of commodity shared-nothing servers. Applications accessing an IMDB are typically limited by the memory capacity of the machine running the IMDB. Sprint partitions and replicates the database into segments and stores them in several data servers. Applications are then limited by the aggregated memory of the machines in the cluster. Transaction synchronization and commitment rely on total-order multicast. Differently from previous approaches, Sprint does not require accurate failure detection to ensure strong consistency, allowing fast reaction to failures. Experiments conducted on a cluster with 32 data servers using TPC-C and a micro-benchmark showed that Sprint can provide very good performance and scalability.
When Scalability Meets Consistency: Genuine Multiversion Update-Serializable Partial Data Replication
- In Proc. of International Conference on Distributed Systems
, 2012
"... Abstract—In this article we introduce GMU, a genuine partial replication protocol for transactional systems, which exploits an innovative, highly scalable, distributed multiversioning scheme. Unlike existing multiversion-based solutions, GMU does not rely on a global logical clock, which represents ..."
Abstract
-
Cited by 29 (16 self)
- Add to MetaCart
(Show Context)
Abstract—In this article we introduce GMU, a genuine partial replication protocol for transactional systems, which exploits an innovative, highly scalable, distributed multiversioning scheme. Unlike existing multiversion-based solutions, GMU does not rely on a global logical clock, which represents a contention point and can limit system scalability. Also, GMU never aborts read-only transactions and spares them from distributed validation schemes. This makes GMU particularly efficient in presence of read-intensive workloads, as typical of a wide range of real-world applications. GMU guarantees the Extended Update Serializability (EUS) isolation level. This consistency criterion is particularly attractive as it is sufficiently strong to ensure correctness even for very demanding applications (such as TPC-C), but is also weak enough to allow efficient and scalable implementations, such as GMU. Further, unlike several relaxed consistency models proposed in literature, EUS has simple and intuitive semantics, thus being an attractive, scalable consistency model for ordinary programmers. We integrated the GMU protocol in a popular open source in-memory transactional data grid, namely Infinispan. On the basis of a large scale experimental study performed on heterogeneous experimental platforms and using industry standard benchmarks (namely TPC-C and YCSB), we show that GMU achieves linear scalability and that it introduces negligible overheads (less than 10%), with respect to solutions ensuring non-serializable semantics, in a wide range of workloads.
Boosting database replication scalability through partial replication and 1-copy-snapshotisolation
- In PRDC’07
"... Databases have become a crucial component in modern information systems. At the same time, they have become the main bottleneck in most systems. Database replication protocols have been proposed to solve the scalability problem by scaling out in a cluster of sites. Current techniques have attained s ..."
Abstract
-
Cited by 28 (1 self)
- Add to MetaCart
(Show Context)
Databases have become a crucial component in modern information systems. At the same time, they have become the main bottleneck in most systems. Database replication protocols have been proposed to solve the scalability problem by scaling out in a cluster of sites. Current techniques have attained some degree of scalability, however there are two main limitations to existing approaches. Firstly, most solutions adopt a full replication model where all sites store a full copy of the database. The coordination overhead imposed by keeping all replicas consistent allows such approaches to achieve only medium scalabilitiy. Secondly, most replication protocols rely on the traditional consistency criterion, 1-copy-serializability, which limits concurrency, and thus scalability of the system. In this paper, we first analyze analytically the performance gains that can be achieved by various partial replication configurations, i.e., configurations where not all sites store all data. From there, we derive a partial replication protocol that provides 1-copy-snapshot isolation as correctness criterion. We have evaluated the protocol with TPC-W and the results show better scalability than full replication.
Detecting and Tolerating Byzantine Faults in Database Systems
, 2008
"... This thesis describes the design, implementation, and evaluation of a replication scheme to handle Byzantine faults in transaction processing database systems. The scheme compares answers from queries and updates on multiple replicas which are off-the-shelf database systems, to provide a single data ..."
Abstract
-
Cited by 18 (1 self)
- Add to MetaCart
This thesis describes the design, implementation, and evaluation of a replication scheme to handle Byzantine faults in transaction processing database systems. The scheme compares answers from queries and updates on multiple replicas which are off-the-shelf database systems, to provide a single database that is Byzantine fault tolerant. The scheme works when the replicas are homogeneous, but it also allows heterogeneous replication in which replicas come from different vendors. Heterogeneous replicas reduce the impact of bugs and security compromises because they are implemented independently and are thus less likely to suffer correlated failures. A final component of the scheme is a repair mechanism that can correct the state of a faulty replica, ensuring the longevity of the scheme. The main challenge in designing a replication scheme for transaction processing