Results 1 - 10
of
95
Transactional storage for geo-replicated systems
- In SOSP
, 2011
"... We describe the design and implementation of Walter, a key-value store that supports transactions and replicates data across distant sites. A key feature behind Walter is a new property called Parallel Snapshot Isolation (PSI). PSI allows Walter to replicate data asynchronously, while providing stro ..."
Abstract
-
Cited by 96 (4 self)
- Add to MetaCart
(Show Context)
We describe the design and implementation of Walter, a key-value store that supports transactions and replicates data across distant sites. A key feature behind Walter is a new property called Parallel Snapshot Isolation (PSI). PSI allows Walter to replicate data asynchronously, while providing strong guarantees within each site. PSI precludes write-write conflicts, so that developers need not worry about conflict-resolution logic. To prevent write-write conflicts and implement PSI, Walter uses two new and simple techniques: preferred sites and counting sets. We use Walter to build a social networking application and port a Twitter-like application.
Tashkent+: Memory-aware load balancing and update filtering in replicated databases
- In EuroSys 2007: Proceedings of the 2nd European Conference on Computer Systems
, 2007
"... We present a memory-aware load balancing (MALB) technique to dispatch transactions to replicas in a replicated database. Our MALB algorithm exploits knowledge of the working sets of transactions to assign them to replicas in such a way that they execute in main memory, thereby reducing disk I/O. In ..."
Abstract
-
Cited by 37 (7 self)
- Add to MetaCart
We present a memory-aware load balancing (MALB) technique to dispatch transactions to replicas in a replicated database. Our MALB algorithm exploits knowledge of the working sets of transactions to assign them to replicas in such a way that they execute in main memory, thereby reducing disk I/O. In support of MALB, we introduce a method to estimate the size and the contents of transaction working sets. We also present an optimization called update filtering that reduces the overhead of update propagation between replicas. We show that MALB greatly improves performance over other load balancing techniques – such as round robin, least connections, and locality-aware request distribution (LARD) – that do not use explicit information on how transactions use memory. In particular, LARD demonstrates good performance for read-only static content Web workloads, but it gives performance inferior to MALB for database workloads as it does not efficiently handle large requests. MALB combined with update filtering further boosts performance over LARD. We build a prototype replicated system, called Tashkent+, with which we demonstrate that MALB and update filtering techniques improve performance of the TPC-W and RUBiS benchmarks. In particular, in a 16-replica cluster and using the ordering mix of TPC-W, MALB doubles the throughput over least connections and improves throughput 52 % over LARD. MALB with update filtering further improves throughput to triple that of least connections and more than double that of LARD. Our techniques exhibit super-linear speedup; the throughput of the 16-replica cluster is 37 times the peak throughput of a standalone database due to better use of the cluster’s memory.
Fault tolerance via diversity for off-the-shelf products: A study with SQL database servers
- IEEE Trans. on Dependable and Secure Computing
"... Copyright & reuse City University London has developed City Research Online so that its users may access the research outputs of City University London's staff. Copyright © and Moral Rights for this paper are retained by the individual author(s) and / or other copyright holders. All materia ..."
Abstract
-
Cited by 35 (9 self)
- Add to MetaCart
(Show Context)
Copyright & reuse City University London has developed City Research Online so that its users may access the research outputs of City University London's staff. Copyright © and Moral Rights for this paper are retained by the individual author(s) and / or other copyright holders. All material in City Research Online is checked for eligibility for copyright before being made available in the live archive. URLs from City Research Online may be freely distributed and linked to from other web pages. Versions of research The version in City Research Online may differ from the final published version. Users are advised to check the Permanent City Research Online URL above for the status of the paper. Enquiries If you have any enquiries about any aspect of City Research Online, or if you wish to make contact with the author(s) of this paper, please email the team at publications@city.ac.uk.IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, MANUSCRIPT TDSC-0141-1006 1
Ws-replication: a framework for highly available web services
- In WWW
, 2006
"... Due to the rapid acceptance of web services and its fast spreading, a number of mission-critical systems will be deployed as web services in next years. The availability of those systems must be guaranteed in case of failures and network disconnections. An example of web services for which availabil ..."
Abstract
-
Cited by 34 (1 self)
- Add to MetaCart
(Show Context)
Due to the rapid acceptance of web services and its fast spreading, a number of mission-critical systems will be deployed as web services in next years. The availability of those systems must be guaranteed in case of failures and network disconnections. An example of web services for which availability will be a crucial issue are those belonging to coordination web service infrastructure, such as web services for transactional coordination (e.g., WS-CAF and WS-Transaction). These services should remain available despite site and connectivity failures to enable business interactions on a 24x7 basis. Some of the common techniques for attaining availability consist in the use of a clustering approach. However, in an Internet setting a domain can get partitioned from the network due to a link overload or some other connectivity problems. The unavailability of a coordination service impacts the availability of all
Sprint: a middleware for high-performance transaction processing
- In EuroSys ’07: Proceedings of the ACM SIGOPS/EuroSys Eu Conference on Computer Systems 2007
, 2007
"... Sprint is a middleware infrastructure for high performance and high availability data management. It extends the functionality of a standalone in-memory database (IMDB) server to a cluster of commodity shared-nothing servers. Applications accessing an IMDB are typically limited by the memory capacit ..."
Abstract
-
Cited by 30 (3 self)
- Add to MetaCart
(Show Context)
Sprint is a middleware infrastructure for high performance and high availability data management. It extends the functionality of a standalone in-memory database (IMDB) server to a cluster of commodity shared-nothing servers. Applications accessing an IMDB are typically limited by the memory capacity of the machine running the IMDB. Sprint partitions and replicates the database into segments and stores them in several data servers. Applications are then limited by the aggregated memory of the machines in the cluster. Transaction synchronization and commitment rely on total-order multicast. Differently from previous approaches, Sprint does not require accurate failure detection to ensure strong consistency, allowing fast reaction to failures. Experiments conducted on a cluster with 32 data servers using TPC-C and a micro-benchmark showed that Sprint can provide very good performance and scalability.
P-store: Genuine partial replication in wide area networks
, 2010
"... Abstract—Partial replication is a way to increase the scala-bility of replicated systems: updates only need to be applied to a subset of the system’s sites, thus allowing replicas to handle independent parts of the workload in parallel. In this paper, we propose P-Store, a partially replicated key-v ..."
Abstract
-
Cited by 29 (6 self)
- Add to MetaCart
(Show Context)
Abstract—Partial replication is a way to increase the scala-bility of replicated systems: updates only need to be applied to a subset of the system’s sites, thus allowing replicas to handle independent parts of the workload in parallel. In this paper, we propose P-Store, a partially replicated key-value store for wide area networks. In P-Store, each transaction T optimistically executes on one or more sites and is then certified to guarantee serializability of the execution. The certification protocol is genuine, it only involves sites that replicate data items read or written by T, and incorporates a mechanism to minimize a convoy effect. P-Store makes a thrifty use of an atomic multicast service to guarantee correctness: no messages need to be multicast during T ’s execution and a single message is multicast to certify T. In case T is global, that is, T ’s execution is distributed at different geographical locations, an extra vote phase is required. Our approach may offer better scalability than previously proposed solutions that either require multiple atomic multicast messages to execute T or are non-genuine. Experimental evaluations reveal that the convoy effect plays an important role even when one percent of the transactions are global. We also compare the scalability of our approach to a fully replicated solution when the proportion of global transactions and the number of sites vary. I.
Boosting database replication scalability through partial replication and 1-copy-snapshotisolation
- In PRDC’07
"... Databases have become a crucial component in modern information systems. At the same time, they have become the main bottleneck in most systems. Database replication protocols have been proposed to solve the scalability problem by scaling out in a cluster of sites. Current techniques have attained s ..."
Abstract
-
Cited by 28 (1 self)
- Add to MetaCart
(Show Context)
Databases have become a crucial component in modern information systems. At the same time, they have become the main bottleneck in most systems. Database replication protocols have been proposed to solve the scalability problem by scaling out in a cluster of sites. Current techniques have attained some degree of scalability, however there are two main limitations to existing approaches. Firstly, most solutions adopt a full replication model where all sites store a full copy of the database. The coordination overhead imposed by keeping all replicas consistent allows such approaches to achieve only medium scalabilitiy. Secondly, most replication protocols rely on the traditional consistency criterion, 1-copy-serializability, which limits concurrency, and thus scalability of the system. In this paper, we first analyze analytically the performance gains that can be achieved by various partial replication configurations, i.e., configurations where not all sites store all data. From there, we derive a partial replication protocol that provides 1-copy-snapshot isolation as correctness criterion. We have evaluated the protocol with TPC-W and the results show better scalability than full replication.
Middleware-based Database Replication: The Gaps Between Theory and Practice
, 2008
"... The need for high availability and performance in data management systems has been fueling a long running interest in database replication from both academia and industry. However, academic groups often attack replication problems in isolation, overlooking the need for completeness in their solution ..."
Abstract
-
Cited by 27 (0 self)
- Add to MetaCart
(Show Context)
The need for high availability and performance in data management systems has been fueling a long running interest in database replication from both academia and industry. However, academic groups often attack replication problems in isolation, overlooking the need for completeness in their solutions, while commercial teams take a holistic approach that often misses opportunities for fundamental innovation. This has created over time a gap between academic research and industrial practice. This paper aims to characterize the gap along three axes: performance, availability, and administration. We build on our own experience developing and deploying replication systems in commercial and academic settings, as well as on a large body of prior related work. We sift through representative examples from the last decade of open-source, academic, and commercial database replication systems and combine this material with case studies from real systems deployed at Fortune 500 customers. We propose two agendas, one for academic research and one for industrial R&D, which we believe can bridge the gap within 5-10 years. This way, we hope to both motivate and help researchers in making the theory and practice of middleware-based database replication more relevant to each other.
Osprey: Implementing MapReduce-Style Fault Tolerance in a Shared-Nothing Distributed Database
"... Abstract — In this paper, we describe a scheme for tolerating and recovering from mid-query faults in a distributed shared nothing database. Rather than aborting and restarting queries, our system, Osprey, divides running queries into subqueries, and replicates data such that each subquery can be re ..."
Abstract
-
Cited by 23 (0 self)
- Add to MetaCart
(Show Context)
Abstract — In this paper, we describe a scheme for tolerating and recovering from mid-query faults in a distributed shared nothing database. Rather than aborting and restarting queries, our system, Osprey, divides running queries into subqueries, and replicates data such that each subquery can be rerun on a different node if the node initially responsible fails or returns too slowly. Our approach is inspired by the fault tolerance properties of MapReduce, in which map or reduce jobs are greedily assigned to workers, and failed jobs are rerun on other workers. Osprey is implemented using a middleware approach, with only a small amount of custom code to handle cluster coordination. Each node in the system is a discrete database system running on a separate machine. Data, in the form of tables, is partitioned amongst database nodes and each partition is replicated on several nodes, using a technique called chained declustering [1]. A coordinator machine acts as a standard SQL interface to users; it transforms an input SQL query into a set of subqueries that are then executed on the nodes. Each subquery represents only a small fraction of the total execution of the query; worker nodes are assigned a new subquery as they finish their current one. In this greedy-approach, the amount of work lost due to node failure is small (at most one subquery’s work), and the system is automatically load balanced, because slow nodes will be assigned fewer subqueries. We demonstrate Osprey’s viability as a distributed system for a small data warehouse data set and workload. Our experiments show that the overhead introduced by the middleware is small compared to the workload, and that the system shows promising load balancing and fault tolerance properties. I.
Scalable deferred update replication
, 2012
"... Abstract—Deferred update replication is a well-known ap-proach to building data management systems as it provides both high availability and high performance. High availability comes from the fact that any replica can execute client transactions; the crash of one or more replicas does not interrupt ..."
Abstract
-
Cited by 18 (6 self)
- Add to MetaCart
(Show Context)
Abstract—Deferred update replication is a well-known ap-proach to building data management systems as it provides both high availability and high performance. High availability comes from the fact that any replica can execute client transactions; the crash of one or more replicas does not interrupt the system. High performance comes from the fact that only one replica executes a transaction; the others must only apply its updates. Since replicas execute transactions concurrently, transaction execution is distributed across the system. The main drawback of deferred update replication is that update transactions scale poorly with the number of replicas, although read-only transactions scale well. This paper proposes an extension to the technique that improves the scalability of update transactions. In addition to presenting a novel protocol, we detail its implementation and provide an extensive analysis of its performance. Keywords-Database replication, scalable data store, fault tolerance, high performance, transactional systems I.